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Introduction 


This  book  originates  from  a  collection  of  lecture  notes  that  the  first  author  prepared 
at  the  University  of  Trieste  with  Michela  Brundu,  over  a  span  of  fifteen  years, 
together  with  the  more  recent  one  written  by  the  second  author.  The  notes  were 
meant  for  undergraduate  classes  on  linear  algebra,  geometry  and  more  generally 
basic  mathematical  physics  delivered  to  physics  and  engineering  students,  as  well 
as  mathematics  students  in  Italy,  Germany  and  Luxembourg. 

The  book  is  mainly  intended  to  be  a  self-contained  introduction  to  the  theory  of 
finite-dimensional  vector  spaces  and  linear  transformations  (matrices)  with  their 
spectral  analysis  both  on  Euclidean  and  Hermitian  spaces,  to  affine  Euclidean 
geometry  as  well  as  to  quadratic  forms  and  conic  sections. 

Many  topics  are  introduced  and  motivated  by  examples,  mostly  from  physics. 
They  show  how  a  definition  is  natural  and  how  the  main  theorems  and  results  are 
first  of  all  plausible  before  a  proof  is  given.  Following  this  approach,  the  book 
presents  a  number  of  examples  and  exercises,  which  are  meant  as  a  central  part  in 
the  development  of  the  theory.  They  are  all  completely  solved  and  intended  both  to 
guide  the  student  to  appreciate  the  relevant  formal  structures  and  to  give  in  several 
cases  a  proof  and  a  discussion,  within  a  geometric  formalism,  of  results  from 
physics,  notably  from  mechanics  (including  celestial)  and  electromagnetism. 

Being  the  book  intended  mainly  for  students  in  physics  and  engineering,  we 
tasked  ourselves  not  to  present  the  mathematical  formalism  per  se.  Although  we 
decided,  for  clarity's  sake  of  our  readers,  to  organise  the  basics  of  the  theory  in  the 
classical  terms  of  definitions  and  the  main  results  as  theorems  or  propositions ,  we 
do  often  not  follow  the  standard  sequential  form  of  definition — theorem — corollary 
— example  and  provided  some  two  hundred  and  fifty  solved  problems  given  as 
exercises. 

Chapter  1  of  the  book  presents  the  Euclidean  space  used  in  physics  in  terms  of 
applied  vectors  with  respect  to  orthonormal  coordinate  system,  together  with  the 
operation  of  scalar,  vector  and  mixed  product.  They  are  used  both  to  describe  the 
motion  of  a  point  mass  and  to  introduce  the  notion  of  vector  field  with  the  most 
relevant  differential  operators  acting  upon  them. 
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Introduction 


Chapters  2  and  3  are  devoted  to  a  general  formulation  of  the  theory  of 
finite-dimensional  vector  spaces  equipped  with  a  scalar  product,  while  the  Chaps.  4 
-6  present,  via  a  host  of  examples  and  exercises,  the  theory  of  finite  rank  matrices 
and  their  use  to  solve  systems  of  linear  equations. 

These  are  followed  by  the  theory  of  linear  transformations  in  Chap.  7.  Such  a 
theory  is  described  in  Chap.  8  in  terms  of  the  Dirac’s  Bra-Ket  formalism,  providing 
a  link  to  a  geometric-algebraic  language  used  in  quantum  mechanics. 

The  notion  of  the  diagonal  action  of  an  endomorphism  or  a  matrix  (the  problem 
of  diagonalisation  and  of  reduction  to  the  Jordan  form)  is  central  in  this  book,  and  it 
is  introduced  in  Chap.  9. 

Again  with  many  solved  exercises  and  examples,  Chap.  10  describes  the  spectral 
theory  for  operators  (matrices)  on  Euclidean  spaces,  and  (in  Chap.  11)  how  it  allows 
one  to  characterise  the  rotations  in  classical  mechanics.  This  is  done  by  introducing 
the  Euler  angles  which  parameterise  rotations  of  the  physical  three-dimensional 
space,  the  notion  of  angular  velocity  and  by  studying  the  motion  of  a  rigid  body 
with  its  inertia  matrix,  and  formulating  the  description  of  the  motion  with  respect  to 
different  inertial  observers,  also  giving  a  characterisation  of  polar  and  axial  vectors. 

Chapter  12  is  devoted  to  the  spectral  theory  for  matrices  acting  on  Hermitian 
spaces  in  order  to  present  a  geometric  setting  to  study  a  finite  level  quantum 
mechanical  system,  where  the  time  evolution  is  given  in  terms  of  the  unitary  group. 
All  these  notions  are  related  with  the  notion  of  Lie  algebra  and  to  the  exponential 
map  on  the  space  of  finite  rank  matrices. 

In  Chap.  13,  we  present  the  theory  of  quadratic  forms.  Our  focus  is  the 
description  of  their  transformation  properties,  so  to  give  the  notion  of  signature, 
both  in  the  real  and  in  the  complex  cases.  As  the  most  interesting  example  of  a 
non-Euclidean  quadratic  form,  we  present  the  Minkowski  spacetime  from  special 
relativity  and  the  Maxwell  equations. 

In  Chaps.  14  and  15,  we  introduce  through  many  examples  the  basics  of  the 
Euclidean  affine  linear  geometry  and  develop  them  in  the  study  of  conic  sections,  in 
Chap.  16,  which  are  related  to  the  theory  of  Kepler  motions  for  celestial  body  in 
classical  mechanics.  In  particular,  we  show  how  to  characterise  a  conic  by  means  of 
its  eccentricity. 

A  reader  of  this  book  is  only  supposed  to  know  about  number  sets,  more 
precisely  the  natural,  integer,  rational  and  real  numbers  and  no  additional  prior 
knowledge  is  required.  To  try  to  be  as  much  self-contained  as  possible,  an  appendix 
collects  a  few  basic  algebraic  notions,  like  that  of  group,  ring  and  field  and  maps 
between  them  that  preserve  the  structures  (homomorphisms),  and  polynomials  in 
one  variable.  There  are  also  a  few  basic  properties  of  the  field  of  complex  numbers 
and  of  the  field  of  (classes  of)  integers  modulo  a  prime  number. 
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Vectors  and  Coordinate  Systems 
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The  notion  of  a  vector ,  or  more  precisely  of  a  vector  applied  at  a  point ,  originates  in 
physics  when  dealing  with  an  observable  quantity.  By  this  or  simply  by  observable , 
one  means  anything  that  can  be  measured  in  the  physical  space — the  space  of  physical 
events —  via  a  suitable  measuring  process.  Examples  are  the  velocity  of  a  point 
particle,  or  its  acceleration,  or  a  force  acting  on  it.  These  are  characterised  at  the 
point  of  application  by  a  direction ,  an  orientation  and  a  modulus  (or  magnitude).  In 
the  following  pages  we  describe  the  physical  space  in  terms  of  points  and  applied 
vectors,  and  use  these  to  describe  the  physical  observables  related  to  the  motion  of  a 
point  particle  with  respect  to  a  coordinate  system  (a  reference  frame).  The  geometric 
structures  introduced  in  this  chapter  will  be  more  rigorously  analysed  in  the  next 
chapters. 


1.1  Applied  Vectors 

We  refer  to  the  common  intuition  of  a  physical  space  made  of  points,  where  the 
notions  of  straight  line  between  two  points  and  of  the  length  of  a  segment  (or  equiv¬ 
alently  of  distance  of  two  points)  are  assumed  to  be  given.  Then,  a  vector  v  can  be 
denoted  as 

v  =  B  —  A  or  v  =  AB, 

where  A,  B  are  two  points  of  the  physical  space.  Then,  A  is  the  point  of  application 
of  v,  its  direction  is  the  straight  line  joining  B  to  A,  its  orientation  the  one  of  the  arrow 
pointing  from  A  towards  B ,  and  its  modulus  the  real  number  \\B  —  A\\  =  \\A  —  B\\, 
that  is  the  length  (with  respect  to  a  fixed  unit)  of  the  segment  AB. 
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Fig.  1.1  The  parallelogram  rule 


If  S  denotes  the  usual  three  dimensional  physical  space,  we  denote  by 

W3  =  {B  -  A  \  A,  B  eS} 

the  collection  of  all  applied  vectors  at  any  point  of  S  and  by 

V]  =  {B  -  A  I  B  e  5} 

the  collection  of  all  vectors  applied  at  A  in  S.  Then 

w3  =  U  V3. 

AgS 

Remark  1.1.1  Once  fixed  a  point  O  in  S ,  one  sees  that  there  is  a  bijection  between 
the  set  =  {B  —  O  \  B  e  S}  and  S  itself.  Indeed,  each  point  B  in  S  uniquely 
determines  the  element  B  —  O  in  V^,  and  each  element  B  —  O  in  uniquely 
determines  the  point  B  in  S. 

It  is  well  known  that  the  so  called  parallelogram  rule  defines  in  V30  a  sum  of 
vectors,  where 

(A  —  O)  +  (B  —  O)  =  (C  —  O), 

with  C  the  fourth  vertex  of  the  parallelogram  whose  other  three  vertices  are  A,  O , 
B ,  as  shown  in  Fig.  1.1. 

The  vector  0  =  O  —  O  is  called  the  zero  vector  (or  null  vector);  notice  that  its 
modulus  is  zero,  while  its  direction  and  orientation  are  undefined. 

It  is  evident  that  V30  is  closed  with  respect  to  the  notion  of  sum  defined  above. 
That  such  a  sum  is  associative  and  abelian  is  part  of  the  content  of  the  proposition 
that  follows. 

Proposition  1.1.2  The  datum  (V^,  +,  0)  is  an  abelian  group. 

Proof  Clearly  the  zero  vector  0  is  the  neutral  (identity)  element  for  the  sum  in  Vq  , 
that  added  to  any  vector  leave  the  latter  unchanged.  Any  vector  A  —  O  has  an  inverse 
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Fig.  1.2  The  opposite  of  a  vector:  A  —  O  =  —(A  —  O) 


with  respect  to  the  sum  (that  is,  any  vector  has  an  opposite  vector)  given  by  A!  —  O , 
where  A!  is  the  symmetric  point  to  A  with  respect  to  O  on  the  straight  line  joining 
A  to  O  (see  Fig.  1.2). 

From  its  definition  the  sum  of  two  vectors  is  a  commutative  operation.  For  the 
associativity  we  give  a  pictorial  argument  in  Fig.  1.3.  □ 

There  is  indeed  more  structure.  The  physical  intuition  allows  one  to  consider 
multiples  of  an  applied  vector.  Concerning  the  collection  Vq  ,  this  amounts  to  define 
an  operation  involving  vectors  applied  in  O  and  real  numbers,  which,  in  order  not  to 
create  confusion  with  vectors,  are  called  (real)  scalars. 

Definition  1.1.3  Given  the  scalar  A  e  R  and  the  vector  A  —  O  e  V^,  the  product 
by  a  scalar 

B  -  O  =  \(A  -  O) 


is  the  vector  such  that: 

(i)  A,  B,  O  are  on  the  same  (straight)  line, 

(ii)  B  —  O  and  A  —  O  have  the  same  orientation  if  A  >  0,  while  A  —  O  and 
B  —  O  have  opposite  orientations  if  A  <  0, 

(iii)  || B  -  O  ||  =  |A|  ||  A  -  0||. 

The  main  properties  of  the  operation  of  product  by  a  scalar  are  given  in  the 
following  proposition. 

Proposition  1.1.4  For  any  pair  of  scalars  A,  p  e  R  and  any  pair  of  vectors 
A  —  O,  B  —  O  e  Vq,  it  holds  that: 
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B'  C 


Fig.  1.4  The  scaling  A (C  —  O)  =  ( C '  —  O)  with  A  >  1 


1.  A (/i(A  -  0))  =  (A/i)(A  -  0), 

2.  1  (A  -  O)  =  A  -  O, 

3.  A  ((A  -  0)  +  (B  -  0))  =  A  (A  -  0)  +  A(£  -  0), 

4.  (A  +  /i)(A  -  0)  =  A(A  -  0)  +  /i(A  -  0). 

Proof  1.  Set  C  —  0  =  A  (/i(A  —  0))  and  0  —  0  =  (A/i)(A  —  0).  If  one  of 
the  scalars  A,  \±  is  zero,  one  trivially  has  C  —  0  =  0  and  0  —  0  =  0,  so 
Point  1.  is  satisfied.  Assume  now  that  A  ^  0  and  /i  ^  0.  Since,  by  definition, 
both  C  and  0  are  points  on  the  line  determined  by  0  and  A,  the  vectors  C  —  0 
and  0  —  0  have  the  same  direction.  It  is  easy  to  see  that  C  —  0  and  0  —  0 
have  the  same  orientation:  it  will  coincide  with  the  orientation  of  A  —  0  or  not, 
depending  on  the  sign  of  the  product  Xp  7^  0.  Since  \Xp\  =  |A|  |/i|  e  R ,  one  has 
||C  -  0 1|  =  ||0  -  0||. 

2.  It  follows  directly  from  the  definition. 

3.  SetC  -  0  =  (A  -  0)  +  (B  -  0)andC/  -  0  =  (Ar  -  0)  +  ( B '  -  0), 
with  A'  -  0  =  A(A  -  0)  and  -  O  =  X(B  -  0). 

We  verify  that  A (C  —  0)  =  C;  —  0  (see  Fig.  1.4). 

Since  0A  is  parallel  to  0A;  by  definition,  then  Z?C  is  parallel  to  B!C  \  OB  is 
indeed  parallel  to  OB' ,  so  that  the  planar  angles  OBC  and  OB'C'  are  equal. 
Also  A  (OB)  =  OB',X(OA )  =  0A;,  and  X(BC)  =  ZFCA  It  follows  that  the 
triangles  0Z?C  and  OB'C '  are  similar:  the  vector  0C  is  then  parallel  OC'  and 
they  have  the  same  orientation,  with  ||0C'||  =  A  ||0C||.  From  this  we  obtain 
OC'  =  X  (OC). 

4.  The  proof  is  analogue  to  the  one  in  point  3.  □ 

What  we  have  described  above  shows  that  the  operations  of  sum  and  product  by  a 
scalar  give  V30  an  algebraic  structure  which  is  richer  than  that  of  abelian  group.  Such 
a  structure,  that  we  shall  study  in  detail  in  Chap.  2,  is  called  in  a  natural  way  vector 
space. 
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1.2  Coordinate  Systems 

The  notion  of  coordinate  system  is  well  known.  We  rephrase  its  main  aspects  in  terms 
of  vector  properties. 

Definition  1.2.1  Given  a  line  r,  a  coordinate  system  A  on  it  is  defined  by  a  point 
O  e  r  and  a  vector  i  =  A  —  O,  where  A  e  r  and  A  ^  O. 

The  point  O  is  called  the  origin  of  the  coordinate  system,  the  norm  ||  A  —  O  ||  is 
the  unit  of  measure  (or  length )  of  A,  with  i  the  basis  unit  vector.  The  orientation  of 
i  is  the  orientation  of  the  coordinate  system  A. 

A  coordinate  system  A  provides  a  bijection  between  the  points  on  the  line  r  and 
R.  Any  point  Per  singles  out  the  real  number  v  such  that  P  —  O  =  x  i;  viceversa, 
for  any  xgR  one  has  the  point  Per  defined  by  P  —  O  =  xi.  One  says  that  P 
has  coordinate  x,  and  we  shall  denote  it  by  P  =  (x),  with  respect  to  the  coordinate 
system  A  that  is  also  denoted  as  (O;  x)  or  (O;  i). 

Definition  1.2.2  Given  a  plane  a ,  a  coordinate  system  n  on  it  is  defined  by  a  point 
O  e  a  and  a  pair  of  non  zero  distinct  (and  not  having  the  same  direction)  vectors 
i  =  A  —  O  and  j  =  B  —  O  with  A,  B  e  a,  and  ||  A  —  0\\  =  ||  B  —  O  ||. 

The  point  O  is  the  origin  of  the  coordinate  system,  the  (common)  norm  of  the 
vectors  i,  j  is  the  unit  length  of  n,  with  i,  j  the  basis  unit  vectors.  The  system  is 
oriented  in  such  a  way  that  the  vector  i  coincides  with  j  after  an  anticlockwise 
rotation  of  angle  f  with  0  <  <fi  <  it.  The  line  defined  by  O  and  i,  with  its  given 
orientation,  is  usually  referred  to  as  a  the  abscissa  axis ,  while  the  one  defined  by  O 
and  j,  again  with  its  given  orientation,  is  called  ordinate  axis. 

As  before,  it  is  immediate  to  see  that  a  coordinate  system  n  on  a  allows  one  to 
define  a  bijection  between  points  on  a  and  ordered  pairs  of  real  numbers.  Any 
Pea  uniquely  provides,  via  the  parallelogram  rule  (see  Fig.  1.5),  the  ordered 
pair  (x,  y)  e  M?  with  P  —  O  =  xi  +  yj;  conversely,  for  any  given  ordered  pair 
(x,  y)  e  M2,  one  defines  Pea  as  given  by  P  —  O  =  xi  +  yj. 

With  respect  to  n,  the  elements  v  e  R  and  y  e  R  are  the  coordinates  of  P, 
and  this  will  be  denoted  by  P  =  (x,y).  The  coordinate  system  n  will  be  denoted 
(O;  i,  j)  or  (0;x,y). 


yi 


i  X  i 


P(x,y) 


Fig.  1.5  The  bijection  P(x,  y)  o  P  —  O  =  xi  +  yj  in  a  plane 
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Definition  1.2.3  A  coordinate  system  n  =  ( 6> ;  i,  j)  on  a  plane  a  is  called  an  orthog¬ 
onal  cartesian  coordinate  system  if  (j)  =  7t/2,  where  0  is  as  before  the  width  of  the 
anticlockwise  rotation  under  which  i  coincides  with  j. 

In  order  to  introduce  a  coordinate  system  for  the  physical  three  dimensional 
space,  we  start  by  considering  three  unit-length  vectors  in  Vq  given  as  u  =  U  —  O , 
\=V  —  0,w=W  —  O,  and  we  assume  the  points  O,  U,  V,  W  not  to  be  on  the 
same  plane.  This  means  that  any  two  vectors,  u  and  v  say,  determine  a  plane  which 
does  not  contain  the  third  point,  say  W.  Seen  from  W,  the  vector  u  will  coincide 
with  v  under  an  anticlockwise  rotation  by  an  angle  that  we  denote  by  uv. 

Definition  1.2.4  An  ordered  triple  (u,  v,  w)  of  unit  vectors  in  Vq  which  do  not  lie 
on  the  same  plane  is  called  right-handed  if  the  three  angles  uv,  vw,  wu,  defined  by 
the  prescription  above  are  smaller  than  it.  Notice  that  the  order  of  the  vectors  matters. 

Definition  1.2.5  A  coordinate  system  £  for  the  space  S  is  given  by  a  point  O  e  S 
and  three  non  zero  distinct  (and  not  lying  on  the  same  plane)  vectors  i  =  A  —  O , 
j  =  B  —  O  andk  —  C  —  O,  with  A,  B,  C  e  <S,  and||A  —  O  ||  =  ||  B  —  0\\  = 
||  C  —  O  ||  and  (i,  j,  k)  giving  a  right-handed  triple. 

The  point  O  is  the  origin  of  the  coordinate  system,  the  common  length  of  the 
vectors  i,  j,  k  is  the  unit  measure  in  £,  with  i,  j,  k  the  basis  unit  vectors.  The  line 
defined  by  O  and  i,  with  its  orientation,  is  the  abscissa  axis,  that  defined  by  O  and  j 
is  the  ordinate  axis,  while  the  one  defined  by  O  and  k  is  the  quota  axis. 

With  respect  to  the  coordinate  system  £,  one  establishes,  via  V^,  a  bijection 
between  ordered  triples  of  real  numbers  and  points  in  S.  One  has 

P  P  -  O  (x,y,  z) 

with  P  —  O  =  xi  +  yj  +  zk  as  in  Fig.  1.6.  The  real  numbers  x,  y,  z  are  the  com¬ 
ponents  (or  coordinates)  of  the  applied  vector  P  —  O,  and  this  will  be  denoted  by 
P  =  (x,y,z ).  Accordingly,  the  coordinate  system  will  be  denoted  by 
£  =  (6>;i,j,k)  =  (O;  x,  y,  z).  The  coordinate  system  £  is  called  cartesian  orthog¬ 
onal  if  the  vectors  i,  j,  k  are  pairwise  orthogonal. 

By  writing  v  =  P  —  O,  it  is  convenient  to  denote  by  vx,  vy,  vz  the  components 
of  v  with  respect  to  a  cartesian  coordinate  system  £ ,  so  to  have 

v  =  vx\  +  Vyj  +  vzk. 

In  order  to  simplify  the  notations,  we  shall  also  write  this  as 

V  —  (VX  ,  Vy,  Vz), 

implicitly  assuming  that  such  components  of  v  refer  to  the  cartesian  coordinate  sys¬ 
tem  (0\  i,  j,  k).  Clearly  the  components  of  a  given  vector  v  depend  on  the  particular 
coordinate  system  one  is  using. 
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z  k 


Fig.  1.6  The  bijection  P(x,  y,  z)  P  —  O  =  xi  +  yj  +  zk  in  the  space 


Exercise  1.2.6  One  has 

1 .  The  zero  (null)  vector  0  =0  —  0  has  components  (0,0,0)  with  respect  to  any 
coordinate  system  whose  origin  is  O ,  and  it  is  the  only  vector  with  this  property. 

2.  Given  a  coordinate  system  £  =  (O;  i,  j,  k),  the  basis  unit  vectors  have  compo¬ 
nents 

i  =  (1,0,0),  j  =  (0,1,0),  k  =  (0,0,1). 

3.  Given  a  coordinate  system  £  =  (O ;  i,  j,  k)  for  the  space  S ,  we  call  coordinate 
plane  each  plane  determined  by  a  pair  of  axes  of  £ .  We  have  v  =  (a,  b,  0),  with 
a,  b  e  R,  if  v  is  on  the  plane  xy,  v;  =  (0,  b\  c')  if  v;  is  on  the  plane  yz,  and 
v"  =  (a",  0,  c")  if  v"  is  on  the  plane  xz. 

Example  1.2.7  The  motion  of  a  point  mass  in  three  dimensional  space  is  described  by 
a  map  t  eR  i->  x(l)  g  Vj  where  t  represents  the  time  variable  and  x(t)  is  the  posi¬ 
tion  of  the  point  mass  at  time  t .  With  respect  to  a  coordinate  system  £  =  ( O ;  x ,  y ,  z) 
we  then  write 

x(t)  =  (x(t),  y(t),  z(t))  or  equivalently  x(t)  =  x(t)i  +  y(t)}  +  z(t)k. 

The  corresponding  velocity  is  a  vector  applied  in  x(t),  that  is  v(0  e  V^,  with 
components 


dx(G  dx  d  y  d  z 

v(0  =  (yx(t),vy(t)9vz(t))  =  =  (—  ,  4:,  —X 


d  t 


dt  dt  dt 


while  the  acceleration  is  the  vector  a (t)  e  V^(r)  with  components 


/  x  dv(0  dx2  d2y  d2zx 

a(0  =  =  (-rr  , 


dt 


dt 2  ’  dt2  ’  d/^2 
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One  also  uses  the  notations 


dx  d2x 

y  =  —  =  x  and  a  =  — -  =  v  =  x. 
d  t  d  t2 


In  the  newtonian  formalism  for  the  dynamics,  a  force  acting  on  the  given  point 
mass  is  a  vector  applied  in  x(t ),  that  is  F  e  V3(r)  with  components  F  =  (Fx,  Fy,  Fz), 
and  the  second  law  of  dynamics  is  written  as 

m  a  =  F 


where  m  >  0  is  the  value  of  the  inertial  mass  of  the  moving  point  mass.  Such  a 
relation  can  be  written  component- wise  as 


m 


d2x 
d  t2 


A  coordinate  system  for  S  allows  one  to  express  the  operations  of  sum  and  product 
by  a  scalar  in  V20  in  terms  of  elementary  algebraic  expressions. 

Proposition  1.2.8  With  respect  to  the  coordinate  system  £  =  (O;  i,  j,  k),  let  us 
consider  the  vectors  y  =  vxi  +  Vyj  +  vzk  and  w  =  wx\  +  wy j  +  wz k,  and  the  scalar 
A  g  R.  One  has: 

(1)  v  +  w  =  (Vjc  +  wx) i  +  (Vy  +  WyX j  +  (vz  +  wz) k, 

(2)  Av  =  Avxi  +  Av^j  +  Avzk. 

Proof  (1)  Since  v  +  w  =  (v*i  +  v^j  +  vzk)  +  (wxi  +  wy j  +  wzk),  by  using  the  com¬ 
mutativity  and  the  associativity  of  the  sum  of  vectors  applied  at  a  point,  one  has 


V  +  W  =  (v*i  +  wxi)  +  (v-yj  +  Wyji)  +  (vzk  +  wz  k). 

Being  the  product  distributive  over  the  sum,  this  can  be  regrouped  as  in  the 
claimed  identity. 

(2)  Along  the  same  lines  as  (1).  □ 


Remark  1.2.9  By  denoting  y  =  (vx,  vy,  vz)  and  w  =  (wx,  wy,  wz),  the  identities 
proven  in  the  proposition  above  are  written  as 

(v*,  Vy,  vz)  +  (wx,  Wy,  wz)  =  (vx  +  wx,  Vy  +  wy,  Vz  +  w z) , 

A(Vjc,  Vy,  vz)  =  (Avjc,  Avy,  Avz). 

This  suggests  a  generalisation  we  shall  study  in  detail  in  the  next  chapter.  If  we 
denote  by  M3  the  set  of  ordered  triples  of  real  numbers,  and  we  consider  a  pair  of 
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elements  (vi,  v2,  x3)  and  (yi,  y2,  y 3)  in  M3,  with  A  e  R,  one  can  introduce  a  sum  of 
triples  and  a  product  by  a  scalar: 


(xux2,  x3)  +  (y  1,  y2,  A3)  =  (*1  +  At,  *2  +  A2,  *3  +  A3), 
A(*i,  x2,  x3)  =  (Axi,  Av2,  Xx3). 


1.3  More  Vector  Operations 

In  this  section  we  recall  the  notions — originating  in  physics — of  scalar  product, 
vector  product  and  mixed  products. 

Before  we  do  this,  as  an  elementary  consequence  of  the  Pythagora’s  theorem,  one 
has  the  following  (see  Fig.  1.6) 

Proposition  1.3.1  Let  \  =  (vx,vy,  vz)  be  an  arbitrary  vector  in  with  respect  to 
the  cartesian  orthogonal  coordinate  system  (O;  i,  j,  z).  One  has 

INI  =  yv2  +  v2+vf. 

Definition  1.3.2  Let  us  consider  a  pair  of  vectors  v,  w  e  V]}.  The  scalar  product  of 
v  and  w,  denoted  by  v  •  w,  is  the  real  number 

v  •  w  =  ||  v ||  ||  w ||  cos  a 

with  a  =  vw  the  plane  angle  defined  by  v  and  w.  Since  cos  a  =  cos  (—a),  for  this 
definition  one  has  cos  vw  =  cos  wv. 

The  definition  of  a  scalar  product  for  vectors  in  V20  is  completely  analogue. 

Remark  1.3.3  The  following  properties  follow  directly  from  the  definition. 

(1)  If  v  =  0,  then  v  •  w  =  0. 

(2)  If  v,  w  are  both  non  zero  vectors,  then 

y  .  w  =  0  cos  a  =  0  vlw. 

(3)  For  any  v  e  Vq,  it  holds  that: 


v  •  v  = 


2 


and  moreover 


v  •  v  =  0  v  =  0. 
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(4)  From  (2),  (3),  if  (O;  i,  j,  k)  is  an  orthogonal  cartesian  coordinate  system,  then 

i  •  i  =  j  •  j  =  k  •  k  =  1,  i  •  j  =  j  •  k  =  k  •  i  =  0. 

Proposition  1.3.4  For  any  choice  of  u,  v,  w  e  Vq  and  A  e  R,  the  following  identi¬ 

ties  hold. 

(i)  y  .  w  =  w  •  v, 

(ii)  (Av)  •  w  =  v  •  (Aw)  =  A(v  •  w), 

(iii)  u  •  (v  +  w)  =  u  •  v  +  u  •  w. 

Proof  (i)  From  the  definition  one  has 

V  •  W  =  ||  V ||  ||  W  ||  COS  VW  =  ||  W  ||  ||  V  ||  cos  wv  =  w  •  V. 

(ii)  Settings  =  (Av)  •  w,  h  =  v  •  (Aw)  and  c  =  A  (v  •  w),  from  the  Definition  1.3.2 

and  the  properties  of  the  norm  of  a  vector,  one  has 

a  =  (Av)  •  w  =  || Av ||  || w ||  cosc/  =  |A|||v||  ||w||  cosc/ 
h  =  v  •  (Aw)  =  || v ||  ||  Aw ||  cos  a"  =  ||v||  |A|  ||w||  cos  a" 
c  —  A(v  •  w)  =  A( || v ||  || w ||  cos  ol)  —  A || v ||  ||w||  cos  a 

where  a'  =  (Av)w,  a"  =  v(Aw)  and  a  =  vw.  If  A  =  0,  then  a  =  h  =  c  =  0. 
If  A  >  0,  then  |A|  =  A  and  a  =  a'  =  a" ;  from  the  commutativity  and  the 
associativity  of  the  product  in  R,  this  gives  that  a  =  h  =  c.  If  A  <  0,  then 
|  A |  =  —A  and  a'  =  a"  =  n  —  a,  thus  giving  cos  a'  =  cos  a"  =  —  cos  a.  These 
read  a  =  b  =  c. 

(iii)  We  sketch  the  proof  for  parallel  u,  v,  w.  Under  this  condition,  the  result  depends 
on  the  relative  orientations  of  the  vectors.  If  u,  v,  w  have  the  same  orientation, 
one  has 


u  •  (v  +  w)  =  ||u ||  || V  +  w || 

=  MKIMI  +  ||w||) 

—  Ilull  IMI  +  llull  llw 

=  11  •  V  +  u  •  w. 


If  v  and  w  have  the  same  orientation,  which  is  not  the  orientation  of  u,  one  has 

u  •  (v  +  w)  =  —  II u ||  || V  +  w || 

=  -l|u||(||v||  +  ||w||) 

—  —  II 11 II  II v ||  —  ||u||  || w || 

=  uv  +  uw. 


We  leave  the  reader  to  explicitly  prove  the  other  cases. 


□ 
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By  expressing  vectors  in  in  terms  of  an  orthogonal  cartesian  coordinate  system, 
the  scalar  product  has  an  expression  that  will  allow  us  to  define  the  scalar  product  of 
vectors  in  the  more  general  situation  of  euclidean  spaces. 

Proposition  1.3.5  Given  (O ;  i,  j,  k),  an  orthogonal  cartesian  coordinate  system  for 
S;  with  vectors  v  =  (vx,  vy,  vz)  and  w  =  (wx,  wy,  wz)  in  Vq,  one  has 

V  W  =  VXWX  +  VyWy  +  VZWZ. 

Proof  Withv  =  vxi  +  vyj  +  vzkandw  =  wx i  +  wy j  +  wzk,  from  Proposition  1.3.4, 
one  has 

v  w  =  (v*i  +  Vyi  +  vzk)  •  (wxi  +  Wyi  +  wz k) 

=  Vxwx i  •  i  +  vywx j  •  i  +  vzwx k  •  i 

+  vxWyi  •  j  +  VyWyi  •  j  +  vzwyk  •  j  +  v XW fi  •  k  +  VyWj  •  k  +  vzwzk  •  k. 

The  result  follows  directly  from  (4)  in  Remark  1.3.3,  that  isi*j  =  j*k  =  k*i  =  0 
as  well  asi  i  =  j  j  =  k  k  =  1.  □ 

Exercise  1.3.6  With  respect  to  a  given  cartesian  orthogonal  coordinate  system,  con¬ 
sider  the  vectors  v  =  (2,  3,  1)  and  w  =  (1,  —  1,  1).  We  verify  they  are  orthogonal. 
From  (2)  in  Remark  1.3.3  this  is  equivalent  to  show  that  v  •  w  =  0.  From  Proposition 
1.3.5,  one  has  v  •  w  =  2  •  1  +  3  •  (—1)  +  1-1=0. 

Example  1.3.7  Ifthemapx(7)  :  R  3  t  i->  x(t)  e  Vq  describes  the  motion  (notice 
that  the  range  of  the  map  gives  the  trajectory)  of  a  point  mass  (with  mass  m),  its 
kinetic  energy  is  defined  by 

T  =  1/M  ||v(0||2. 

With  respect  to  an  orthogonal  coordinate  system  X  =  (6>;  i,  j,  k),  given 
\(t)  =  (vx(t),  vy(t),  vz(t))  as  in  the  Example  1.2.7,  we  have  from  the  Proposi¬ 
tion  1.3.5  that 

T  =  Im(v2  +  v2+v2). 

Also  the  following  notion  will  be  generalised  in  the  context  of  euclidean  spaces. 

Definition  1.3.8  Given  two  non  zero  vectors  v  and  w  in  V^,  the  orthogonal  projec¬ 
tion  of  v  along  w  is  defined  as  the  vector  vw  in  Vq  given  by 


v  •  w 


As  the  first  part  of  Fig.  1.7  displays,  vw  is  parallel  to  w. 

From  the  identities  proven  in  Proposition  1.3.4  one  easily  has 
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Fig.  1.7  Orthogonal  projections 

Proposition  1.3.9  For  any  u,  v,  w  €  Vq,  the  following  identities  hold: 

(a)  (u  +  v)w  =  uw  +  vw, 

(b)  v  •  w  =  vw  •  w  =  wv  •  v  . 

The  point  (a)  is  illustrated  by  the  second  part  of  the  Fig.  1.7. 

Remark  1.3.10  The  scalar  product  we  have  defined  is  a  map 

a  :  Vq  x  Vq  — >  R,  cr(v,  w)  =  v  •  w. 

Also,  the  scalar  product  of  vectors  on  a  plane  is  a  map  a  :  V20  x  V20  — >  R. 

Definition  1.3.11  Let  v,  w  e  Vq.  The  vector  product  between  v  and  w,  denoted  by 
v  A  w,  is  defined  as  the  vector  in  V20  whose  modulus  is 

||  v  A  w ||  =  ||  v  ||  ||  w  ||  sin  a, 

where  a  =  vw,  with  0  <  a  <  n  is  the  angle  defined  by  v  e  w;  the  direction  of  v  A  w 
is  orthogonal  to  both  v  and  w;  and  its  orientation  is  such  that  (v,  w,  v  A  w)  is  a 
right-handed  triple  as  in  Definition  1.2.4. 

Remark  1.3.12  The  following  properties  follow  directly  from  the  definition. 

(i)  if  y  =  0  then  vaw  =  0, 

(ii)  if  v  and  w  are  both  non  zero  then 

v  A  w  =  0  since  =  0  v  ||  w, 

(one  trivially  has  v  a  v  =  0), 

(iii)  if  (O;  i,  j,  k)  is  an  orthogonal  cartesian  coordinate  system,  then 

iAj  =  k  =  —  j  a  i,  j  a  k  =  i  =  — k  A  j.  k  A  i  =  j  =  — i  A  k. 

We  omit  to  prove  the  following  proposition. 


1.3  More  Vector  Operations 
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Proposition  1.3.13  For  any  u,  v,  w  e  V30  and  AgM,  the  following  identities  holds : 

(i)  v  a  w  =  —  w  A  y, 

(ii)  (Av)  a  w  =  v  A  (Aw)  =  A(v  A  w) 

(iii)  u  A  (v  +  w)  =  u  A  v  +  u  A  w, 

Exercise  1.3.14  With  respect  to  a  given  cartesian  orthogonal  coordinate  system, 
consider  in  V30  the  vectors  v  =  (1,0,  —1)  ew  =  (— 2,0,  2).  To  verify  that  they  are 
parallel,  we  recall  the  abov  e  result  (ii)  in  the  Remark  1.3.12  and  compute,  using  the 
Proposition  1.3.15,  that  y  a  w  =  0. 

Proposition  1.3.15  Let  v  =  (vx,vy,  vz)  and  w  =  (wx,  wy,  wz)  be  elements  in  Vq 
with  respect  to  a  given  cartesian  orthogonal  coordinate  system.  It  is 

v  A  w  =  ( vywz  —  vzwy ,  vzwx  —  vxwz ,  vxwy  —  vywx). 

Proof  Given  the  Remark  1.3.12  and  the  Proposition  1.3.13,  this  comes  as  an  easy 
computation.  □ 

Remark  1.3.16  The  vector  product  defines  a  map 

r  :  V30  x  V30  — >  V30,  t(v,  w)  =  v  a  w. 

Clearly,  such  a  map  has  no  meaning  on  a  plane. 

Example  1.3.17  By  slightly  extending  the  Definition  1.3.11,  one  can  use  the  vec¬ 
tor  product  for  additional  notions  coming  from  physics.  Following  Sect.  1.1,  we 
consider  vectors  u,  w  as  elements  in  W3,  that  is  vectors  applied  at  arbitrary 
points  in  the  physical  three  dimensional  space  S ,  with  components  u  =  ( ux ,  uy,  uz) 
and  w  =  ( wx ,  wy,  wz )  with  respect  to  a  cartesian  orthogonal  coordinate  system 
X  =  (O;  i,j,  k).  In  parallel  with  Proposition  1.3.15,  we  define  r  :  W3  x  W3  — >  W3 
as 


u  A  W  =  (UyWz  —  uzwy ,  uzwx  —  uxwz,  ux  wy  —  uywx). 

If  u  e  V3  is  a  vector  applied  at  x,  its  momentum  with  respect  to  a  point  x'  e  S  is  the 
vector  in  W3  defined  by 


M  =  (x  —  x')  A  u. 


In  particular,  if  u  =  F  is  a  force  acting  on  a  point  mass  in  x,  its  momentum  is 
M  =  (x  —  x')  A  F. 

If  x(t)  e  V30  describes  the  motion  of  a  point  mass  (with  mass  m  >  0),  whose 
velocity  is  v(0,  then  its  corresponding  angular  momentum  with  respect  to  a  point  x; 
is  defined  by 


L xr(t)  =  (x(t)  —x')  A  my(t). 
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Exercise  1.3.18  The  angular  momentum  is  usually  defined  with  respect  to  the  origin 
of  the  coordinate  system  £ ,  giving  Lo(0  =  x(7)  A  mv(/).  If  we  consider  a  circular 
uniform  motion 

x(t)  =  =  rcos(cot),  y(t)  =  rsin(ujt),  z(t)  =  0^, 

with  r  >  0  the  radius  of  the  trajectory  and  wgR  the  angular  velocity,  then 
v(t)  =  (vx(t)  =  —nosin^t),  y(t )  =  ru;  cos(ujt),  vz(t)  =  0) 


so  that 

Lo(0  =  (0,  0,  mruj). 

Thus,  a  circular  motion  on  the  xy  plane  has  angular  momentum  along  the  z  axis. 

Definition  1.3.19  Given  an  ordered  triple  u,  v,  w  e  V^,  their  mixed  product  is  the 
real  number 

U  •  (V  A  W) . 

Proposition  1.3.20  Given  a  cartesian  orthogonal  coordinate  system  in  S  with 
u  =  (ux,  uy ,  uz),  v  =  (vx ,  vy ,  vz)  and  w  =  (wx,  wy,  wz)  in  Vq,  one  has 

U  •  (v  A  w)  =  Ux(vyWz  -  V z 'LL) y )  +  Uy(vzWx  -  Vxwz )  +  Uz(vxWy  -  VyWX). 

Proof  It  follows  immediately  by  Propositions  1.3.5  and  1.3.15.  □ 

In  the  space  S ,  the  vector  product  between  uAwis  the  area  of  the  parallelogram 
defined  by  u  and  v,  while  the  mixed  product  u  •  (v  A  w)  give  the  volume  of  the 
parallelepiped  defined  by  u,  v,  w. 

Proposition  1.3.21  Given  u,  v,  w  e  Vq. 

1.  Denote  a  =  vw  the  angle  defined  by  v  and  w.  Then,  the  area  A  of  the  parallelo¬ 
gram  whose  edges  are  u  and  \,  is  given  by 

A  =  ||  v ||  ||  w  ||  sin  a  =  ||v  A  w||. 

2.  Denote  6  =  u(v  A  w)  the  angle  defined  by  u  and  v  A  w.  Then  the  volume  V  of 
the  parallelepiped  whose  edges  are  u,  v,  w,  is  given  by 

V  =  A||u||  cos  0  =  || u  •  v  A  w||. 


Proof  The  claim  is  evident,  as  shown  in  the  Figs.  1.8  and  1.9. 


□ 


1 .4  Divergence,  Rotor,  Gradient  and  Laplacian 


15 


Fig.  1.8  The  area  of  the  parallelogramm  with  edges  v  and  w 


Fig.  1.9  The  volume  of  the  parallelogramm  with  edges  v,  w,  u 


1.4  Divergence,  Rotor,  Gradient  and  Laplacian 

We  close  this  chapter  by  describing  how  the  notion  of  vector  applied  at  a  point  also 
allows  one  to  introduce  a  definition  of  a  vector  field. 

The  intuition  coming  from  physics  requires  to  consider,  for  each  point  x  in  the 
physical  space  S ,  a  vector  applied  at  x.  We  describe  it  as  a  map 

S  3  X  l->  A(x)  G  V*. 

With  respect  to  a  given  cartesian  orthogonal  reference  system  for  S  we  can  write 
this  in  components  as  x  =  (jti,  v2,  x$)  and  A(x)  =  (Ai(x),  A2(x),  A3(x))  and  one 
can  act  on  a  vector  field  with  partial  derivatives  (first  order  differential  operators), 
da  =  (d/dxa)  with  a  =  1,  2,  3,  defined  as  usual  by 

8a(Ab)  =  with  5ab  — 

Then,  (omitting  the  explicit  dependence  of  A  on  x)  one  defines 

3 

divA  =  T>A.)  g  R 

k=  i 

rot  A  =  ( <92A3  -  d3A2)  i  +  (83  A  i  -  <9iA3)j  +  (d\A2  -  <92Ai)k  € 


|  1  if  a  =  b 
jo  if  a  /  b 
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By  introducing  the  triple  V  =  (<9i,  82,  83),  such  actions  can  be  formally  written  as  a 
scalar  product  and  a  vector  product,  that  is 


div  A  =  V  •  A 
rot  A  =  V  A  A  . 

Furthermore,  if  /  :  S  — >►  R  is  a  real  valued  function  defined  on  S ,  that  is  a  (real) 
scalar  field  on  S ,  one  has  the  grad  operator 

grad  /  =  V/  =  (9!/,  ft/.  d3/) 

as  well  as  the  Laplacian  operator 

3 

V2  /  =  div(V/)  =  ( J2  dkdk)f  =  d\f  +  d\f  +  dlf  . 

k=  1 

Exercise  1.4.1  The  properties  of  the  mixed  products  yields  a  straightforward  proof 
of  the  identity 

div  (rot  A)  =  V  •  (V  A  A)  =  0  , 

for  any  vector  field  A.  On  the  other  hand,  a  direct  computation  shows  also  the  identity 

rot  (grad  /)  =  V  A  (grad  /)  =  0  , 


for  any  scalar  field  /. 


Chapter  2 

Vector  Spaces 


® 

Check  for 
updates 


The  notion  of  vector  space  can  be  defined  over  any  field  K.  We  shall  mainly  consider 
the  case  K.  =  R  and  briefly  mention  the  case  K  =  C.  Starting  from  our  exposition, 
it  is  straightforward  to  generalise  to  any  field. 


2.1  Definition  and  Basic  Properties 

The  model  of  the  construction  is  the  collection  of  all  vectors  in  the  space  applied  at 
a  point  with  the  operations  of  sum  and  multiplication  by  a  scalar,  as  described  in  the 
Chap.  1. 

Definition  2.1.1  A  non  empty  set  V  is  called  a  vector  space  over  R  (or  a  real  vector 
space  or  an  R- vector  space)  if  there  are  defined  two  operations, 

(a)  an  internal  one:  a  sum  of  vectors  s  :  V  x  V  ->  V, 

V  x  V  3  (v,  v')  i->  s(v ,  v')  =  v  +  v' , 

(b)  an  exterior  one:  the  product  by  a  scalar  p  :  R  x  V  — >  V 

R  x  V  9  (&,  u)  i->  p{k ,  n)  = 

and  these  operations  are  required  to  satisfy  the  following  conditions: 

(1)  There  exists  an  element  Oy  g  V,  which  is  neutral  for  the  sum,  such  that 
(V,  +,  Oy)  is  an  abelian  group. 

For  any  k,k'  eR  and  v,  v'  e  V  one  has 

(2)  (k  +  k')v  =  kv  +  k'v 

(3)  k(v  +  v')  =  kv  +  kv' 


©  Springer  International  Publishing  AG,  part  of  Springer  Nature  2018 
G.  Landi  and  A.  Zampini,  Linear  Algebra  and  Analytic  Geometry 
for  Physical  Sciences ,  Undergraduate  Lecture  Notes  in  Physics, 
https://doi.org/10.1007/978-3-319-78361-l_2 
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(4)  k(k!v)  =  (kk')v 

(5)  1  v  =  v,  with  1  =  1r. 

The  elements  of  a  vector  space  are  called  vectors ;  the  element  Oy  is  the  zero  or  null 
vector.  A  vector  space  is  also  called  a  linear  space. 

Remark  2.1.2  Given  the  properties  of  a  group  (see  A. 2. 9),  the  null  vector  Oy  and  the 
opposite  —  v  to  any  vector  v  are  (in  any  given  vector  space)  unique.  The  sums  can 
be  indeed  simplified,  that  is  v  +  w  =  v  +  u  =>  w  —  u.  Such  a  statement  is  easily 
proven  by  adding  to  both  terms  in  v  +  w  =  v-\-u  the  element  —  v  and  using  the 
associativity  of  the  sum. 

As  already  seen  in  Chap.  1,  the  collections  V 0  (vectors  in  a  plane)  and  V0  (vectors 
in  the  space)  applied  at  the  point  O  are  real  vector  spaces.  The  bijection  < — >  R3 

introduced  in  the  Definition  1.2.5,  together  with  the  Remark  1.2.9,  suggest  the  natural 
definitions  of  sum  and  product  by  a  scalar  for  the  set  R3  of  ordered  triples  of  real 
numbers. 

Proposition  2.1.3  The  collection  R3  of  triples  of  real  numbers  together  with  the 
operations  defined  by 

I.  (xux2,  x3)  +  (yi,  y2,  y3)  =  (x\  +yi,x2  +  y2,  x 3  +  3^3),  for  any  (xux2,  x3), 

w)  e  M3, 

II.  a(x  1,  a'2,  X3)  =  (ax  1,  ax 2,  ax 3),  for  any  a  e  R,  (a'i  ,  X2,  xj)  e  M3, 
is  a  real  vector  space. 

Proof  We  verify  that  the  conditions  given  in  the  Definition  2.1.1  are  satisfied.  We 
first  notice  that  (a)  and  ( b )  are  fullfilled,  since  M3  is  closed  with  respect  to  the 
operations  in  I.  and  II.  of  sum  and  product  by  a  scalar.  The  neutral  element  for  the 
sum  is  0^3  =  (0,  0,  0),  since  one  clearly  has 


(xi,x2,x3)  +  (0,0,0)  =  (xux2,x3). 

The  datum  (M3,  +,  0^3)  is  an  abelian  group,  since  one  has 
•  The  sum  (M3,  +)  is  associative,  from  the  associativity  of  the  sum  in  R: 

(*i ,  x2,  x3)  +  ((yi,y2,  y3)  +  (zi,  Z2,  z3)) 

=  (xux2,  x3)  +  (yi  +  Zi,  y2  +  z2,  y3  +  z3) 

=  (*1  +  (yi  +  zi),  x2  +  (y2  +  z2),  x3  +  (J3  +  z3)) 
—  ((^1  +  Ji)  +  Z\,  (x2  +  y2)  +  z2,  (x3  +  y3)  +  z 3) 
=  (x\  +y\,x2  +  y2,  v3  +  y3)  +  (zi,  z2,  z3) 

=  ((*1,  x2,  x3)  +  (yi,  y2,  y3))  +  (zi,  z2,  z3). 


2.1  Definition  and  Basic  Properties 
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•  From  the  identity 


(xux2,x3)  +  (-*1,  -x2,  -x3)  =  (X\  -xux2  ~x2,x3  -  x3)  =  (0,0,0) 


one  has  (— x\,  —x2,  —x3)  as  the  opposite  in  M3  of  the  element  (x\,  x2,  x3). 

•  The  group  (M3,  +)  is  commutative,  since  the  sum  in  R  is  commutative: 

(*i,  x2,  x3)  +  (yi,  y2,  y3)  =  (*i  +y\,x2  +  y2,  x3  +  y3) 

=  (yi  +xi,  y2  +  x2,  y3  +x3) 

=  (yi,  yi,  yi)  +  (xux2,x3). 

We  leave  to  the  reader  the  task  to  show  that  the  conditions  (1),  (2),  (3),  (4)  in  Defi¬ 
nition  2.1.1  are  satisfied:  for  any  A,  A'  e  M  and  any  (x\,  x2,  x3),  (yi,  y2,  y3)  £  M3  it 
holds  that 

1.  (A  +  A')(*i,  x2,  x3)  =  X(xi,x2,  x3)  +  X'(xi,x2,  x3) 

2.  A((vi,  x2,  x3)  +  (yi,  y2,  y3))  =  X(xux2,  x3)  +  A(yi,  y2,  y3) 

3.  X(X'(xu  x2,  x3))  =  (XX')(xux2,x3) 

4.  l(x\,x2,  x3)  =  (xux2,  x3).  □ 

The  previous  proposition  can  be  generalised  in  a  natural  way.  If  n  £  N  is  a  positive 
natural  number,  one  defines  the  n- th  cartesian  product  of  R,  that  is  the  collection  of 
ordered  n  -tuples  of  real  numbers 

Rn  =  {X  =  (x\,  . . . ,  xn)  :  Xk  £  R}, 

and  the  following  operations,  with  a  £  R,  (x\, . . . ,  xn),  (yi ,  -  -  - ,  yn)  ^ 

In.  (*i, .  ..,xn)  +  (yi,  ...,yn)  =  (xi+yi,...,xn  +  yn) 

Iln.  a(x i,  . . . ,  v„)  =  (ax i,  . . . ,  av„). 

The  previous  proposition  can  be  directly  generalised  to  the  following. 

Proposition  2.1.4  With  respect  to  the  above  operations,  the  set  M77  is  a  vector  space 
over  R. 

The  elements  in  M77  are  called  n-tuples  of  real  numbers.  With  the  notation 
X  =  (x\,  ,  xn)  £  M77,  the  scalar  Xk,  with  k  —  1,  2,  . . . ,  n,  is  the  k-th  component 

of  the  vector  X. 

Example  2.1.5  As  in  the  Definition  A. 3. 3,  consider  the  collection  of  all  polynomials 
in  the  indeterminate  v  and  coefficients  in  R,  that  is 

R[x]  =  \f(x)  =  ao  +  a\x  +  a2x2  +  •  •  •  +  anxn  :  ak  £  R,  n  >  0}, 

with  the  operations  of  sum  and  product  by  a  scalar  A  £  R  defined,  for  any  pair  of 
elements  in  M[x],  f(x)  =  ao  +  aiv  +  a2x2  +  •  •  •  +  anxn  and  g(v)  =  bo  +  b\x  + 
b2x2  +  •  •  •  +  bmxm ,  component-wise  by 
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Ip.  f  (x)  g(x)  —  (2q  +  bo  +  ( a\  +  b\)x  +  (ci2  +  b2)x2  +  •  •  • 
lip.  A  f  (v)  —  Aao  T-  A a\x  T-  \ci2x 2  T-  •  *  •  T-  \anxn . 

Endowed  with  the  previous  operations,  the  set  R[x]  is  a  real  vector  space;  R[x]  is 
indeed  closed  with  respect  to  the  operations  above.  The  null  polynomial,  denoted  by 
0m[x]  (that  is  the  polynomial  with  all  coefficients  equal  zero),  is  the  neutral  element 
for  the  sum.  The  opposite  to  the  polynomial  f(x)  =  <2$  +  a\x  +  CI2X2  +  •  •  •  +  anxn 
is  the  polynomial  (— <20  —  a\x  —  CL2X2  —  ...  —  anxn)  e  M[v]  that  one  denotes  by 
—f{x).  We  leave  to  the  reader  to  prove  that  (M[v],  +,  0r[X])  is  an  abelian  group  and 
that  all  the  additional  conditions  in  Definition  2.1.1  are  fulfilled. 

Exercise  2.1.6  We  know  from  the  Proposition  A. 3. 5  that  M[v]r,  the  subset  in  R[x] 
of  polynomials  with  degree  not  larger  than  a  fixed  r  e  N,  is  closed  under  addition 
of  polynomials.  Since  the  degree  of  the  polynomial  A  f(x)  coincides  with  the  degree 
of  f(x)  for  any  A  7^  0,  we  see  that  also  the  product  by  a  scalar,  as  defined  in  Up. 
above,  is  defined  consistently  on  M[v]r.  It  is  easy  to  verify  that  also  R[jt]r  is  a  real 
vector  space. 

Remark  2.1.7  The  proof  that  R”,  R[jc]  and  M[v]r  are  vector  space  over  R  relies  on 
the  properties  of  R  as  a  field  (in  fact  a  ring,  since  the  multiplicative  inverse  in  R  does 
not  play  any  role). 

Exercise  2.1.8  The  set  Cn ,  that  is  the  collection  of  ordered  n -tuples  of  complex 

numbers,  can  be  given  the  structure  of  a  vector  space  over  C.  Indeed,  both  the 

operations  In.  and  Iln.  considered  in  the  Proposition  2.1.3  when  intended  for  complex 
numbers  make  perfectly  sense: 

IC.  (Zi,  •  •  •  ,  Zn)  +  (Wl,  •  •  •  ,  W/i)  =  (Zl  +  W\,  .  .  .  ,  Zn  +  ™n) 

IIC.  C(Z1,  •  •  •  ,  Zn)  =  (CZU  •  •  •  ,  CZn) 

with  cgC,  and  (zi, . . . ,  zw),  (uq, . . . ,  wn)  e  Cn. 

The  reader  is  left  to  show  that  Cn  is  a  vector  space  over  C. 

The  space  Cn  can  also  be  given  a  structure  of  vector  space  over  R,  by  noticing 
that  the  product  of  a  complex  number  by  a  real  number  is  a  complex  number.  This 
means  that  Cn  is  closed  with  respect  to  the  operations  of  (component- wise)  product 
by  a  real  scalar.  The  condition  He.  above  makes  sense  when  ceR. 

We  next  analyse  some  elementary  properties  of  general  vector  spaces. 

Proposition  2.1.9  Let  V  be  a  vector  space  over  R.  For  any  k  e  R  and  any  v  e  V  it 
holds  that: 

(i)  Oru  =  0y, 

(ii)  kOy  =  0y, 

(iii)  ifkv  =  0y  then  it  is  either  k  =  0^  or  v  =  0y, 

(iv)  {—k)v  =  —(kv)  =  k(—v). 

Proof  (i)  From  O^v  =  (Or  +  Or)v  =  Ori;  +  Oru,  since  the  sums  can  be  simpli¬ 
fied,  one  has  that  Oru  =  0y. 

(ii)  Analogously:  k0v  =  k( 0y  +  0y)  =  k0v  +  kOy  which  yields  kOy  =  0y. 
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(iii)  Let  k  7^  0,  so  k~{  e  R  exists.  Then,  v  =  Iv  =  k~[kv  =  k~l0y  =  0y,  with  the 
last  equality  coming  from  (ii). 

(iv)  Since  the  product  is  distributive  over  the  sum,  from  (i)  it  follows  that 

kv  +  (—k) v  =  (k  +  (—k))v  =  Oru  =  0y  that  is  the  first  equality.  For  the  sec¬ 
ond,  one  writes  analogously  kv  +  k(—v)  =  k(v  —  v)  =  k0v  =  0y  □ 

Relations  (i),  (ii),  (iii)  above  are  more  succinctly  expressed  by  the  equivalence: 

kv  =  Oy  k  =  Or  or  v  =  0 y. 


2.2  Vector  Subspaces 

Among  the  subsets  of  a  real  vector  space,  of  particular  relevance  are  those  which 
inherit  from  V  a  vector  space  structure. 

Definition  2.2.1  Let  V  be  a  vector  space  over  R  with  respect  to  the  sum  s  and  the 
product  p  as  given  in  the  Definition  2.1.1.  Let  W  c  V  be  a  subset  of  V.  One  says 
that  IT  is  a  vector  subspace  of  V  if  the  restrictions  of  s  and  p  to  W  equip  W  with 
the  structure  of  a  vector  space  over  R. 

In  order  to  establish  whether  a  subset  W  c  V  of  a  vector  space  is  a  vector  subspace, 
the  following  can  be  seen  as  criteria . 

Proposition  2.2.2  Let  W  be  a  non  empty  subset  of  the  real  vector  space  V.  The 
following  conditions  are  equivalent. 

(i)  W  is  a  vector  sub  space  of  V, 

(ii)  W  is  closed  with  respect  to  the  sum  and  the  product  by  a  scalar,  that  is 

(a)  w  +  wf  g  W,  for  any  w ,  wf  e  W, 

(b)  kw  e  W,  for  any  IgR  and  w  e  W, 

(iii)  kw  +  k'  wf  e  W,  for  any  k,  k'  eR  and  any  w,  wr  e  W. 


Proof  The  implications  (i)  =>>  ii)  and  (ii)  =>►  (iii)  are  obvious  from  the  definition. 

(iii)  =>►  (ii):  By  taking  k  =  k'  =  1  one  obtains  (a),  while  to  show  point  (b)  one 
takes  k'  =  Or. 

(ii)  =>►  (i):  Notice  that,  by  hypothesis,  W  is  closed  with  respect  to  the  sum  and 
product  by  a  scalar.  Associativity  and  commutativity  hold  in  W  since  they  hold  in  V . 

One  only  needs  to  prove  that  W  has  a  neutral  element  and  that,  for  such  a 
neutral  element,  any  vector  in  W  has  an  opposite  in  W.  If  0y  e  W,  then  0y  is  the 
zero  element  in  W :  for  any  w  e  W  one  has  0y  +  u;  =  u;  +  0y  =  u;  since  w  e  V ; 
from  ii,  (b)  one  has  Oru;  g  W  for  any  w  e  W\  from  the  Proposition  2.1.9  one  has 
Oru;  =  0y;  collecting  these  relations,  one  concludes  that  0y  e  W.lf  w  e  W,  again 
from  the  Proposition  2.1.9  one  gets  that  —  w  =  (— 1  )w  e  W .  □ 
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Exercise  2.2.3  Both  W  =  {Oy}  c  V  and  W  =  V  c  V  are  trivial  vector  subspaces 
of  V. 

Exercise  2.2.4  We  have  already  seen  that  R[x]r  c  R[jc]  are  vector  spaces  with 
respect  to  the  same  operations,  so  we  may  conclude  that  R[x]r  is  a  vector  subspace 
of  R[x]. 

Exercise  2.2.5  Let  v  e  V  a  non  zero  vector  in  a  vector  space,  and  let 

C(v)  =  {av  :  a  e  M}  C  V 

be  the  collection  of  all  multiples  of  v  by  a  real  scalar.  Given  the  elements  w  =  av 
and  w'  =  a' v  in  C(v),  from  the  equality 

aw  +  a'w'  =  (aa  +  a'a')v  e  C(v) 

for  any  a ,  a!  e  R,  we  see  that,  from  the  Proposition  2.2.2,  C(v)  is  a  vector  subspace 
of  V,  and  we  call  it  the  ( vector )  line  generated  by  v. 

Exercise  2.2.6  Consider  the  following  subsets  W  C  M2 : 

1.  W\  =  {(x,  y)  e  M2  :  x  —  3 y  =  0}, 

2.  W2  =  {(x,  j)et2  :  x  +  y  =  1}, 

3.  W2  =  {(x,  y)  e  R2  :  x  e  N}, 

4.  W4  =  {(x,  y)  e  K2  :  x2  -  y  =  0}. 

From  the  previous  exercise,  one  sees  that  W\  is  a  vector  subspace  since 
Wi  =  £((3,  1)).  On  the  other  hand,  W2,  W3,  W4  are  not  vector  subspaces  of  M2.  The 
zero  vector  (0,  0)  ^  W2;  while  W3  and  W4  are  not  closed  with  respect  to  the  product 
by  a  scalar,  since,  for  example,  (1,0)  e  W3  but  ^(1,0)  =  (^,0)  ^  W3.  Analogously, 
(1,  1)  €  W4  but  2(1,  1)  =  (2,  2)  £  W4. 

The  next  step  consists  in  showing  how,  given  two  or  more  vector  subspaces  of  a 
real  vector  space  V,  one  can  define  new  vector  subspaces  of  V  via  suitable  operations. 

Proposition  2.2.7  The  intersection  W\  H  W2  of  any  two  vector  subspaces  W\  and 
W2  of  a  real  vector  space  V  is  a  vector  subspace  of  V. 

Proof  Consider  <2,  b  e  R  and  v,  w  e  W\  H  W2.  From  the  Propostion  2.2.2  it  follows 
that  av  +  bw  e  W\  since  W\  is  a  vector  subspace,  and  also  that  av  -\-bw  e  W2  for 
the  same  reason.  As  a  consequence,  one  has  av  +  bw  e  W\  Pi  W2.  □ 

Remark  2.2.8  In  general,  the  union  of  two  vector  subspaces  of  V  is  not  a  vector 
subspace  of  V.  As  an  example,  the  Fig.  2. 1  shows  that,  if  C(v)  and  C(w)  are  generated 
by  different  v,  w  e  M2,  then  C(v)  U  C(w)  is  not  closed  under  the  sum,  since  it  does 
not  contain  the  sum  v  +  w,  for  instance. 
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Fig.  2.1  The  vector  line  C(v  +  w)  with  respect  to  the  vector  lines  C(v)  and  C(w) 


Proposition  2.2.9  Let  W\  and  W2  be  vector  subspaces  of  the  real  vector  space  V 
and  let  W 1  +  W 2  denote 


W\  +  W2  =  {T  G  V  I  V  =  W\  +  W2\  W\  G  W\,  VO 2  £  W2}  C  V. 


Then  W\  +  W2  is  the  smallest  vector  subspace  of  V  which  contains  the  union 

Wi  u  w2. 

Proof  Let  a,  a'  g  R  and  v,  v'  e  W\  +  W2;  this  means  that  there  exist  w  1,  w\  e  W\ 
and  u>2,  w '2  e  W2,  so  that  v  =  w\  +  W2  and  1/  =  w\  +  w'2.  Since  both  W\  and  W2 
are  vector  subspaces  of  V,  from  the  identity 

av  +  av '  =  aw\  +  aw2  +  aw[  +  arwr2  =  (aw  1  +  a'w[)  +  (aw2  +  a'w2), 

onehasatui  +  e  VFi  andaw2  -\-a'w'2  e  W2.  It  follows  that  W\  +  W2  is  a  vector 
subspace  of  V. 

It  holds  that  W\  +  W2  ^  W\  U  W2:  if  w\  G  W\,  it  is  indeed  w\  =  w\  +  Oy  in 
W\  +  W2',  one  similarly  shows  that  W2  C  W\  +  W2. 

Finally,  let  Z  be  a  vector  subspace  of  V  containing  W\  U  W2\  then  for  any 
w  1  e  W\  and  w2  g  W2  it  must  be  w\  +  w2  G  Z.  This  implies  Z  D  Wi  +  W2,  and 
then  VFi  +  W2  is  the  smallest  of  such  vector  subspaces  Z.  □ 

Definition  2.2.10  If  W\  and  W2  are  vector  subspaces  of  the  real  vector  space  V  the 
vector  subspace  W\  +  W2  of  V  is  called  the  sum  of  W\  e  W2. 

The  previous  proposition  and  definition  are  easily  generalised,  in  particular: 

Definition  2.2.11  If  W\, . . . ,  Wn  are  vector  subspaces  of  the  real  subspace  V,  the 
vector  subspace 


W"i  T  •  •  •  T  —  {nG  V  |  u  —  W\  - P  •  •  •  -p  wn\  Wi  G  W(,  i  —  1 ,  ,n} 


of  V  is  the  sum  ofW\,...,Wn. 
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Definition  2.2.12  Let  W\  and  W'2  be  vector  subspaces  of  the  real  vector  space  V.  The 
sum  W\  +  W2  is  called  direct  if  W\  Fi  W2  =  {Oy }.  A  direct  sum  is  denoted  W\  0  W2. 

Proposition  2.2.13  Let  W\,  W2  be  vector  subspaces  of  the  real  vector  space  V. 
Their  sum  W  =  W\  +  W2  is  direct  if  and  only  if  any  element  v  G  W\  +  W2  has  a 
unique  decomposition  as  v  =  w\  +  W2  with  Wi  e  Wi,  i  =  1,2. 

Proof  We  first  suppose  that  the  sum  W\  +  W2  is  direct,  that  is  W\  Fi  W2  =  {Oy}.  If 
there  exists  an  element  v  g  W\  +  W2  withn  =  w\  +  u>2  =  w[  +  w'2,  and  wt,  w[  G  Wi, 
then  w\  —  w\  =  w'2  —  W2  and  such  an  element  would  belong  to  both  W\  and  W2. 
This  would  then  be  zero,  since  W\  Fi  W2  =  {Oy},  and  then  w\  =  w\  and  W2  =  w'2. 

Suppose  now  that  any  element  v  e  W\  +  W2  has  a  unique  decomposition 
v  =  w\  +  u)2  with  Wi  e  Wi,  i  =  1,2.  Let  v  e  W\  fi  W2;  then  v  e  W\  and  v  6  W2 
which  gives  Oy  =  v  —  v  e  W\  +  W2,  so  the  zero  vector  has  a  unique  decomposition. 
But  clearly  also  Oy  =  Oy  +  Oy  and  being  the  decomposition  for  Oy  unique,  this  gives 
V  —  Oy.  EH 

These  proposition  gives  a  natural  way  to  generalise  the  notion  of  direct  sum  to  an 
arbitrary  number  of  vector  subspaces  of  a  given  vector  space. 

Definition  2.2.14  Let  W\, . . . ,  Wn  be  vector  subspaces  of  the  real  vector  space  V. 
The  sum  W\  +  •  •  •  +  Wn  is  called  direct  if  any  of  its  element  has  a  unique  decom¬ 
position  as  v  =  w\  +  •  •  •  +  wn  with  wt  e  Wt,i  =  1 , ...  ,n.  The  direct  sum  vector 
subspace  is  denoted  W\  ®  •  •  •  ®  Wn . 


2.3  Linear  Combinations 

We  have  seen  in  Chap.  1  that,  given  a  cartesian  coordinate  system  X  =  (0\  i,  j,  k) 
for  the  space  S ,  any  vector  v  e  can  be  written  as  v  =  a\  +  b\  +  ck.  One  says 
that  v  is  a  linear  combination  of  i,  j,  k.  From  the  Definition  1.2.5  we  also  know  that, 
given  X,  the  components  (a,  b,  c)  are  uniquely  determined  by  v.  For  this  one  says 
that  i,  j,  k  are  linearly  independent.  In  this  section  we  introduce  these  notions  for  an 
arbitrary  vector  space. 

Definition  2.3.1  Let  iq ,  . . . ,  vn  be  arbitrary  elements  of  a  real  vector  space  V .  A  vec¬ 
tor  v  e  V  is  a  linear  combination  ofv\,...,vnif  there  exist  n  scalars  Ai , . . . ,  Xn  e  R, 
such  that 

v  =  Aiiq  +  •  •  •  +  \nVn- 


The  collection  of  all  linear  combinations  of  the  vectors  v\, ...  ,vn  is  denoted  by 
C(v\, . . . ,  vn).  If  I  c  V  is  an  arbitrary  subset  of  V,  by  C(I)  one  denotes  the  col¬ 
lection  of  all  possible  linear  combinations  of  vectors  in  /,  that  is 

C{I)  =  {Aiiq  +  •  •  •  +  \nVn  I  A i  €  R,  Vi  G  /,  n  >  0}. 
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The  set  C(I)  is  also  called  the  linear  span  of  I. 

Proposition  2.3.2  The  space  C(v\,  . . . ,  vn)  is  a  vector  subspace  of  V,  called  the 
space  generated  by  v\,  . . . ,  vn  or  the  linear  span  of  the  vectors  v\,  ...  ,vn. 

Proof  After  Proposition  2.2.2,  it  is  enough  to  show  that  C(v\,  . . . ,  vn)  is  closed 
for  the  sum  and  the  product  by  a  scalar.  Let  v,  w  e  jC(v  \, ...  ,vn)\  it  is  then 

v  =  AiUi  +  •  •  •  +  Xnvn  and  w  =  p\V\  +  •  •  •  +  pnvn,  for  scalars  Ai, . . . ,  Xn  and 

pi,  . . . ,  pn-  Recalling  point  (2)  in  the  Definition  2.1.1,  one  has 

V  +  W  =  (Ai  +  +  •  •  •  +  (Xn  +  Pn)vn  C  C(V\,  .  .  .  ,  Vn). 

Next,  let  a  e  R.  Again  from  the  Definition  2.1.1  (point  4)),  one  has  an  =  (aX\)v\  + 
•  •  •  +  (a\n)vn,  which  gives  av  e  C{v i, . . . ,  vn).  □ 

Exercise  2.3.3  The  following  are  two  examples  for  the  notion  just  introduced. 

(1)  Clearly  one  has  Vq  =  £(i,  j)  and  Vq  =  k). 

(2)  Let  v  =  (1,0,  — 1)  and  w  =  (2,  0,  0)  be  two  vectors  in  M3;  it  is  easy  to  see  that 

C(v,  w)  is  a  proper  subset  of  M3.  For  example,  the  vector 

u  =  (0,  1,  0)  ^  C(v,  w).  If  u  were  in  C(v ,  w),  there  should  be  a,  [3  e  R  such 
that 

(0,  1,  0)  =  a(l,  0,  -1)  +  (3(2,  0,  0)  =  (a  +  2(3,  0,  -a). 

No  choice  of  a,  (3  e  R  can  satisfy  this  vector  identity,  since  the  second  com¬ 
ponent  equality  would  give  1=0,  independently  of  a,  (3. 

It  is  interesting  to  explore  which  subsets  I  c  V  yield  C(I)  =  V.  Clearly,  one  has 
V  =  C(V).  The  example  (1)  above  shows  that  there  are  proper  subsets  I  c  V 
whose  linear  span  coincides  with  V  itself.  We  already  know  that  Vq  =  C(\,  j)  and 
that  V30  =  22(i,  j,  k) :  both  Vq  and  V20  are  generated  by  a  finite  number  of  (their) 
vectors.  This  is  not  always  the  case,  as  the  following  exercise  shows. 

Exercise  2.3.4  The  real  vector  space  R[x]  is  not  generated  by  a  finite  num¬ 
ber  of  vectors.  Indeed,  let  f\(x ),...,  fn(x)  e  M[v]  be  arbitrary  polynomials.  Any 
p(x)  e  C(f\,  . . . ,  fn)  is  written  as 

p(x)  =  Ai/i(v)  +  •  •  •  +  A  nfn(x) 


with  suitable  Ai , . . . ,  Xn  e  R.  If  one  writes  dt  =  deg(/;)  and  d  =  max{r/i ,  ...  ,dn}, 
from  Remark  A. 3. 5  one  has  that 

deg (/?(*))  =  deg(A1/1(v)  H - b  Xnfn(x))  <  ma x{du  ...  ,dn}  =  d. 

This  means  that  any  polynomial  of  degree  d  +  1  or  higher  is  not  contained  in 
C(f\,  ...,/„).  This  is  the  case  for  any  finite  n,  giving  a  finite  d;  we  conclude  that,  if 
n  is  finite,  any  C(I)  with  I  =  (f\(x),  . . . ,  fn(x))  is  a  proper  subset  of  M[x]  which 
can  then  not  be  generated  by  a  finite  number  of  polynomials. 
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On  the  other  hand,  M[v]  is  indeed  the  linear  span  of  the  infinite  set 


Definition  2.3.5  A  vector  space  V  over  R  is  said  to  be  finitely  generated  if 
there  exists  a  finite  number  of  elements  v\ ,  . . . ,  vn  in  V  which  are  such  that 
V  =  jC(v  i,  . . . ,  vn).  In  such  a  case,  the  set  {iq,  . . . ,  vn }  is  called  a  system  of  gen¬ 
erators  for  V. 

Proposition  2.3.6  Let  I  c  V  and  v  e  V.  It  holds  that 

£«WU /)  =  £(/)  <=>  v  e  £(/). 

Proof  “  =>►”  Let  us  assume  that  £({u}  U  /)  =  £(/).  Since  v  G  jC({v }  U  /),  then 

n  G  £(/). 

“  -<=”  We  shall  prove  the  claim  under  the  hypothesis  that  we  have  a  finite 
system  {tq,  . . . ,  vn}.  The  inclusion  C(I)  c  £({n}  U  /)  is  obvious.  To  prove  the 
inclusion  £({v}  U  I)  c  £(/),  consider  an  arbitrary  element  w  G  £({n}U/),so  that 
w  =  an  +  /iiui  +  •  •  •  +  tinvn.  By  the  hypothesis,  v  G  C(I)  so  it  is 
v  =  Aiui  +  •  •  •  +  Xnvn.  We  can  then  write 


W  —  a(AiUi  +  *  •  •  +  A nVn)  +  M  1^1  +  •  •  •  +  Hnvn- 


From  the  properties  of  the  sum  of  vectors  in  V,  one  concludes  that  w  e  jC(v  ,vn) 
=  £(/).  □ 

Remark  2.3.7  From  the  previous  proposition  one  has  also  the  identity 

£(v i,  . . . ,  vn,  Oy)  =  C(v i,  . . . ,  vn) 
for  any  v\,  ...  ,vn  G  V. 

If  I  is  a  system  of  generators  for  V,  the  next  question  to  address  is  whether  I 
contains  a  minimal  set  of  generators  for  V,  that  is  whether  there  exists  a  set  /  C  / 
(with  /  7^  I)  such  that  C(J)  =  C(I)  =  V.  The  answer  to  this  question  leads  to  the 
notion  of  linear  independence  for  a  set  of  vectors. 

Definition  2.3.8  Given  a  collection  I  =  {v\ , . . . ,  vn }  of  vectors  in  a  real  vector  space 
V,  the  elements  of  I  are  called  linearly  independent  on  R,  and  the  system  I  is  said 
to  be  free,  if  the  following  implication  holds, 

Aiiq  +  •  •  •  +  \iVn  =  Oy  Ai  =  •  •  •  =  A„  =  Or. 

That  is,  if  the  only  linear  combination  of  elements  of  I  giving  the  zero  vector  is  the 
one  whose  coefficients  are  all  zero. 

Analogously,  an  infinite  system  I  c  V  is  said  to  btfree  if  any  of  its  finite  subsets 
is  free. 
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The  vectors  v\,  ...  ,vn  e  V  are  said  to  be  linearly  dependent  if  they  are  not 
linearly  independent,  that  is  if  there  are  scalars  (Ay  . . . ,  A„)  7^  (0,  . . . ,  0)  such  that 
A1U1  +  •  •  •  +  Xni )n  =  0y. 

Exercise  2.3.9  It  is  clear  that  i,  j,  k  are  linearly  independent  in  V'q,  while  the  vec¬ 
tors  v\  =  i  +  j,  V2  =  j  —  k  and  =  2i  —  j  +  3k  are  linearly  dependent,  since  one 
computes  that  2ui  —  3r>2  —  r>3=0. 

Proposition  2.3.10  Let  V  be  a  real  vector  space  and  I  =  {v\,  . . . ,  vn}  be  a  collec¬ 
tion  of  vectors  in  V .  The  following  properties  hold  true: 

(i)  if  0y  e  I,  then  I  is  not  free, 

(ii)  I  is  not  free  if  and  only  if  one  of  the  elements  Vi  is  a  linear  combination  of  the 

other  elements  v\,  ... ,  1,  17+1,  . . . ,  vn, 

(iii)  if  I  is  not  free,  then  any  J  12  I  is  not  free, 

(iv)  if  I  is  free,  then  any  J  such  that  J  c  I  is  free;  that  is  any  subsystem  of  a  free 
system  is  free. 

Proof  i)  Without  loss  of  generality  we  suppose  that  v\  =  0y.  Then,  one  has 


IrVi  +  0rV2  +  •  •  •  +  0rVh  =  Oy, 


which  amounts  to  say  that  the  zero  vector  can  be  written  as  a  linear  combination 
of  elements  in  I  with  a  non  zero  coefficients. 

(ii)  Suppose  I  is  not  free.  Then,  there  exists  scalars  (Ay  ... ,  Xn)  7^  (0, . . . ,  0)  giv¬ 
ing  the  combination  X\V\  +  •  •  •  +  \nvn  =  0y.  Without  loss  of  generality  take 
Ai  7^  0;  so  A 1  is  invertible  and  we  can  write 

V\  —  A1  (  A2IJ2  •  A nVyi)  £  C,(v 2,  . . . ,  Vyi). 


In  order  to  prove  the  converse,  we  start  by  assuming  that  a  vector  vt  is  a  linear 
combination 


Vi  —  X\V\  +  •  •  •  +  Xi-iVj-i  +  A/+iU/+i  +  •  •  •  +  Xnvn. 

This  identity  can  be  written  in  the  form 

A1U1  +  *  *  *  +  Xi-iVi-i  —  Vi  +  A/+1U/+1  +  •  •  •  +  Xnvn  =  0y. 

The  zero  vector  is  then  written  as  a  linear  combination  with  coefficients  not  all 
identically  zero,  since  the  coefficient  of  vt  is  —  1 .  This  amounts  to  say  that  the 
system  I  is  not  free. 

We  leave  the  reader  to  show  the  obvious  points  (iii)  and  (iv).  □ 
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2.4  Bases  of  a  Vector  Space 

Given  a  real  vector  space  V,  in  this  section  we  determine  its  smallest  possible  systems 
of  generators,  together  with  their  cardinalities. 

Proposition  2.4.1  Let  V  be  a  real  vector  space,  with  v\,  ...  ,vn  e  V.  The  following 
facts  are  equivalent: 

(i)  the  elements  v\,  ...  ,vn  are  linearly  independent, 

(ii)  v\  7^  Oy  and,  for  any  i  >  2,  the  vector  u*  is  not  a  linear  combination  of 
V\,  •  •  •  ,  Vi- 1. 

Proof  The  implication  (i)  =>>  (ii)  directly  follows  from  the  Proposition  2.3.10. 

To  show  the  implication  (ii)  =>-  (i)  we  start  by  considering  a  combination 
Aii>i  +  •  •  •  +  \nvn  =  Oy.  Under  the  hypothesis,  vn  is  not  a  linear  combination  of 
v\,...,vn-\,  so  it  must  be  Xn  =  0:  were  it  not,  one  could  write  vn  = 
—  ...  —  A„_ii;w_i).  We  are  then  left  with  X\Vi  +  •  •  •  +  Xn-ivn-\  =  0y, 
and  an  analogous  reasoning  leads  to  An_i  =  0.  After  n  —  1  similar  steps,  one  has 
Ai^i  =  0;  since  v\  /  0  by  hypothesis,  it  must  be  (see  2.1.5)  that  Ai  =  0.  □ 

Theorem  2.4.2  Any  finite  system  of  generators  for  a  vector  space  V  contains  a  free 
system  of  generators  for  V. 

Proof  Let  I  =  {v\, . . . ,  vs]  be  a  system  of  generators  for  a  real  vector  space  V. 
Recalling  the  Remark  2.3.7,  we  can  take  Vi  7^  0  for  any  i  =  1, ...  ,s.  We  define 
iteratively  a  system  of  subsets  of  /,  as  follows: 

•  take  I\  =  I  =  {r>i,  . . . ,  vs}, 

•  if  v2  e  C(v i),  take  I2  =  I\  \  {v2}\  if  v2  7^  £( vi),  take  I2  =  Iu 

•  if  v3  e  C(vu  v2),  take  I3  =  I2\  {n3};  if  v3  ^  C(vi,  v2),  take  /3  =  /2, 

•  Iterate  the  steps  above. 

The  whole  procedure  consists  in  examining  any  element  in  the  starting  1 1  =  /,  and 
deleting  it  if  it  is  a  linear  combination  of  the  previous  ones.  After  s  steps,  one  ends 
up  with  a  chain  /1  2  •  •  •  2  4  2  f 

Notice  that,  for  any  j  =  2,  . . . ,  s,  it  is  jC(Ij)  =  C(Ij~  1).  It  is  indeed  either  Ij  = 
Ij- 1  (which  makes  the  claim  obvious)  or /y  _i  =  Ij  U  {vj},withvj  e  £(v\, . . . ,  Vj^i) 
c  C(Ij-\)\  from  Proposition  2.3.6,  it  follows  that  C(Ij)  =  C(Ij~  1). 

One  has  then  C(I)  =  C(I\  )  =  •  •  •  =  C(IS),  and  Is  is  a  system  of  generators  of 
V .  Since  no  element  in  Is  is  a  linear  combination  of  the  previous  ones,  the  Proposi¬ 
tion  2.4.1  shows  that  Is  is  free.  □ 

Definition  2.4.3  Let  V  be  a  real  vector  space.  An  ordered  system  of  vectors  I  = 
(v\ ,  . . . ,  vn)  in  V  is  called  a  basis  of  V  if  I  is  a  free  system  of  generators  for  V,  that 
is  V  =  C(v  1,  ...  ,vn)  and  v\,  . . . ,  vn  are  linearly  independent. 

Corollary  2.4.4  Any  finite  system  of  generators  for  a  vector  space  contains  ( at  least ) 
a  basis.  This  means  also  that  any  finitely  generated  vector  space  has  a  basis. 
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Proof  It  follows  directly  from  the  Theorem  2.4.2. 


□ 


Exercise  2.4.5  Consider  the  vector  space  R3  and  the  system  of  vectors  /  =  {tq, . . . ,  tq} 


with 


Following  Theorem  2.4.2,  we  determine  a  basis  for  C{v i,  tq,  tq,  tq ,  tq). 

•  At  the  first  step  l\  —  I . 

•  Since  V2  =  —  2tq,  so  that  V2  G  C(v  1),  delete  V2  and  take  h  —  I\  \  {^2}* 

•  One  has  tq  £  C(v  1),  so  keep  tq  and  take  I3  =  I2 . 

•  One  has  tq  6  £(tq,  tq)  if  and  only  if  there  exist  a,  (3  e  R  such  that  tq  = 
av\  +  (3v 3,  that  is  (1,  —1,  2)  =  (a  +  2/3,  a,  —a  +  /3).  By  equating  components, 
one  has  a  =  —  1,  (3  =  1.  This  shows  that  tq  =  —  tq  +  r>3  e  £(ui,  U3);  therefore 
delete  v\  and  take  I 4  =  I3  \  {^4}. 

•  Similarly  one  shows  that  v$  £  C(v  1,  ^3).  A  basis  for  £(/)  is  then  I5  =  I4  = 
Ot,  v3,  v5). 

The  next  theorem  characterises  free  systems. 

Theorem  2.4.6  A  system  I  =  {v\,  . . . ,  vn}  of  vectors  in  V  is  free  if  and  only  if  any 
element  in  jC(v  1,  . . . ,  vn)  can  be  written  in  a  unique  way  as  a  linear  combination  of 
the  elements  v\,  ...  ,vn. 

Proof  We  assume  that  I  is  free  and  that  C(v\ ,  . . . ,  vn)  contains  a  vector,  say  v,  which 
has  two  linear  decompositions  with  respect  to  the  vectors  vt : 


V  —  X\V\  +  •  •  •  +  \nVn  —  ^\V\  +  •  •  •  +  hnvn. 


This  identity  would  give  (Ai  —  /i \)v\  +  •  •  •  +  (Xn  —  Hn)vn  =  Ov;  since  the  elements 
Vi  are  linearly  independent  it  would  read 


that  is  A  i  =  Hi  for  any  i  =  1 ,  ...  ,n.  This  says  that  the  two  linear  expressions  above 
coincide  and  v  is  written  in  a  unique  way. 

We  assume  next  that  any  element  in  C(v\ ,  . . . ,  vn)  as  a  unique  linear  decomposi¬ 
tion  with  respect  to  the  vectors  vt .  This  means  that  the  zero  vector  Oy  £  jC(v  \, ...  ,vn) 
has  the  unique  decomposition  Oy  =  Ori^  +  •  •  •  +  0^vn.  Let  us  consider  the  expres¬ 
sion  Aiiq  +  •  •  •  +  Xnvn  =  Oy;  since  the  linear  decomposition  of  Oy  is  unique,  it 
is  A i  =  0  for  any  i  =  1 ,  ...  ,n.  This  says  that  the  vectors  tq,  . . . ,  vn  are  linearly 
independent.  □ 

Corollary  2.4.7  Let  v\,  ...  ,vn  be  elements  of  a  real  vector  space  V.  The  system 
I  =  (tq,  . . . ,  vn)  is  a  basis  for  V  if  and  only  if  any  element  v  e  V  can  be  written  in 
a  unique  way  as  v  =  Aiiq  +  •  •  •  +  Xnvn. 
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Definition  2.4.8  Let  I  =  (iq, . . . ,  vn)  be  a  basis  for  the  real  vector  space  V.  Any 
v  e  V  is  then  written  as  a  linear  combination  v  =  Aiiq  +  •  •  •  +  Xnvn  is  a  unique 
way.  The  scalars  Ai,  . . . ,  \n  (which  are  uniquely  determined  by  Corollary  2.4.7)  are 
called  the  components  of  v  on  the  basis  I.  We  denote  this  by 

V  —  (Ai ,  .  .  .  ,  A n) I  • 

Remark  2.4.9  Notice  that  we  have  taken  a  free  system  in  a  vector  space  V  and 
a  system  of  generators  for  V  not  to  be  ordered  sets  while  on  the  other  hand,  the 
Definition  2.4.3  refers  to  a  basis  as  an  ordered  set.  This  choice  is  motivated  by  the 
fact  that  it  is  more  useful  to  consider  the  components  of  a  vector  on  a  given  basis 
as  an  ordered  array  of  scalars.  For  example,  if  I  =  (iq,  iq)  is  a  basis  for  V,  so  it  is 
J  =  (iq,  iq).  But  one  considers  I  equivalent  to  J  as  systems  of  generators  for  V, 
not  as  bases. 

Exercise  2.4.10  With  £  =  (i,  j)  and  £'  =  (j,  i)  two  bases  for  Vq,  the  vector  v  = 
2i  +  3j  has  the  following  components 


v  =  (2,  3)s  =  (3,  2) s' 


when  expressed  with  respect  to  them. 

Remark  2.4.11  Consider  the  real  vector  space  Mn  and  the  vectors 

=  (1,0,...,  0), 

^2  =  (0,  1, . . . ,  0), 

en  —  (0,  0,  . . . ,  1 ) . 

Since  any  element  v  =  (x\,  . . . ,  xn)  can  be  uniquely  written  as 


(x\,  . . . ,  xn)  —  x\e\  +  •  •  •  +  xnen, 


the  system  £  =  {e\ ,  . . . ,  en)  is  a  basis  for  W1 . 

Definition  2.4.12  The  system  £  =  (e\, ...  ,en)  above  is  called  the  canonical  basis 
for  W1 . 

The  canonical  basis  for  Mr  is  £  =  (e\,  e£),  with  e\  =  (1,0)  and  £>2  =  (0,  1);  the 
canonical  basis  for  M3  is  £  =  (e\,  e2,  £3),  with  e\  =  (1,  0,  0),  £2  =  (0,  f0)  and 
£3  =  (0,  0,  1). 

Remark  2.4.13  We  have  meaningfully  introduced  the  notion  of  a  canonical  basis  for 
Mn .  Our  analysis  so  far  should  nonetheless  make  it  clear  that  for  an  arbitrary  vector 
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space  V  over  R  there  is  no  canonical  choice  of  a  basis.  The  exercises  that  follow 
indeed  show  that  some  vector  spaces  have  bases  which  appear  more  natural  than 
others,  in  a  sense. 

Exercise  2.4.14  We  refer  to  the  Exercise  2.1.8  and  consider  C  as  a  vector  space 
over  R.  As  such  it  is  generated  by  the  two  elements  1  and  i  since  any  complex 
number  can  be  written  as  z  =  cl  +  ib,  with  a,kR.  Since  the  elements  1,  i  are 
linearly  independent  over  R  they  are  a  basis  over  R  for  C. 

As  already  seen  in  the  Exercise  2. 1 .8,  Cn  is  a  vector  space  both  over  C  and  over  R. 
As  a  C- vector  space,  Cn  has  canonical  basis  £  =  (e  i, ...  ,en),  where  the  elements 
ei  are  given  as  in  the  Remark  2.4.11.  For  example,  the  canonical  basis  for  C2  is 
£  =  Oi,  e2),  with  ei  =  (1,  0),  c2  =  (0,  1). 

As  a  real  vector  space,  Cn  has  no  canonical  basis.  It  is  natural  to  consider  for  it 
the  following  basis  B  =  (b\,  c\ . . . ,  bn,  cn),  made  of  the  In  following  elements, 

b\  —  (1,  0,  . . . ,  0),  ci  =  (i,  0,  . . . ,  0), 
b2  =  (0,  1,  . . . ,  0),  c2  =  (0,  i,  . . . ,  0), 


bn  —  (0,  0,  . . . ,  1),  cn  —  (0,  0,  . . . ,  i). 

For  C2  such  a  basis  is  B  =  (b\,  c\,  b2,  c2),  with  b\  =  (1,0),  ci  =  (i,  0),  and 
b2  =  (0,  1),  c2  =  (0,  i). 

Exercise  2.4.15  The  real  vector  space  R[x]r  has  a  natural  basis  given  by  all  the 
monomials  (1,  x,  x2,  . . . ,  xr)  with  degree  less  than  r,  since  any  element 
p(x )  G  R[v]r  can  be  written  in  a  unique  way  as 


r\ 

p(x )  =  ao  +  a\x  +  a2x  +  •  •  •  arx} 


with  at  G  R. 

Remark  2.4.16  We  have  seen  in  Chap.  1  that,  by  introducing  a  cartesian  coordinate 
system  in  V30  and  with  the  notion  of  components  for  the  vectors,  the  vector  space 
operations  in  Vq  can  be  written  in  terms  of  operations  among  components.  This  fact 
is  generalised  in  the  following  way. 

Let  I  =  (v\, . . . ,  vn)  be  a  basis  for  V.  Let  v,  w  g  V,  with  v  =  (Ai, . . . ,  A n)i  and 
w  =  (/i1?  . . . ,  pn)i  the  corresponding  components  with  respect  to  I .  We  compute 
the  components,  with  respect  to  /,  of  the  vectors  v  +  w.  We  have 


V  +  W  —  (Aiiq  +  •  •  •  +  A  nVn)  +  (P\V\  +  •  •  •  +  dnvn) 
=  (Ai  +  Pi)vi  +  •  •  •  +  (A„  +  pn)vn, 


so  we  can  write 


v  +  w  —  (Ai  +  /ii, . . . ,  \n  +  pn)j. 
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Next,  with  d  eR  we  also  have 


dv  —  cl(\\V\  +  •  •  •  +  Xnx )n)  —  (^Ai)i^i  +  •  •  •  +  {ciXn)vn , 


so  we  can  write 

civ  =  (ctX\, . . . ,  dXnf. 

If  z  =  dv  +  bw  with,  z  =  (£1, . . . ,  £n)i,  it  is  immediate  to  see  that 

(£i>  •  •  •  >  £n)i  —  («Ai  +  b/i i, . . . ,  +  bfin)i 

or  equivalently 

«£•  =  aXi  +  b/jLi,  for  any  i  =  l, ...  ,n. 

Proposition  2.4.17  V  be  d  vector  spdce  over  R,  dnd  I  =  (tq,  . . . ,  vn)  d  bdsis 

for  V.  Consider  d  system 

W\  =  (An,  •  •  •  ?  Ai «)/,  ir>2  =  (A21  ?  •  •  •  5  A2 n)i->  •  •  •  ?  ~  (A^i, . . . ,  A^n)/ 

of  vectors  in  V,  dnd  denote  z  =  (£1,  . . . ,  £„)/. 

z  =  aiifi  +  •  •  •  +  asws  «<==>>  =  a\\n  +  •  •  •  +  as\si  for  any  i  =  1 , ...  ,n. 

The  i-th  component  of  the  linedr  combindtion  z  of  the  vectors  is  given  by  the 
some  lineor  combindtion  of  the  i-th  components  of  the  vectors 

Proof  It  comes  as  a  direct  generalisation  of  the  previous  remark.  □ 

Corollary  2.4.18  With  the  some  notdtions  ds  before,  one  hds  thdt 

(d)  the  vectors  W\,  ...  ,ws  dre  linedrly  independent  in  V  if  dnd  only  if  the  corre¬ 
sponding  n-tuples  of  components  (An,  •  •  • ,  Ai„),  . . . ,  (An,  . . . ,  Xsn)  ore  linedrly 
independent  in  Rn, 

(b)  the  vectors  w  1,  ... ,  wsform  d  system  of  genemtors  for  V  if  dnd  only  if  the  cor¬ 
responding  n-tuples  of  components  (An,  . . . ,  Ai„),  . . . ,  (An,  . . . ,  Xsn )  genemte 

Rn. 

A  free  system  can  be  completed  to  a  basis  for  a  given  vector  space. 

Theorem  2.4.19  Let  V  be  d  finitely  genemted  redl  vector  spdce.  Any  free  finite 
system  is  contdined  in  d  bdsis  for  V. 

Proof  Let  I  =  {v\, ...  ,vs]  be  a  free  system  for  the  real  vector  space  V.  By  the 
Corollary  2.4.4,  V  has  a  basis,  that  we  denote  B  =  (e  1, . . . ,  en).  The  set  I  U  B  = 
{tq,  . . . ,  vs,  e\,  . . . ,  en]  obviously  generates  V.  By  applying  the  procedure  given 
in  the  Theorem  2.4.2,  the  first  s  vectors  are  not  deleted,  since  they  are  linearly 
independent  by  hypothesis;  the  subsystem  B'  one  ends  up  with  at  the  end  of  the 
procedure  will  then  be  a  basis  for  V  that  contains  I.  □ 
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2.5  The  Dimension  of  a  Vector  Space 

The  following  (somewhat  intuitive)  result  is  given  without  proof. 

Theorem  2.5.1  Let  V  be  a  vector  space  over  R  with  a  basis  made  ofn  elements. 
Then, 

(i)  any  free  system  I  in  V  contains  at  most  n  elements, 

(ii)  any  system  of  generators  for  V  has  at  least  n  elements, 

(iii)  any  basis  for  V  has  n  elements. 

This  theorem  makes  sure  that  the  following  definition  is  consistent. 

Definition  2.5.2  If  there  exists  a  positive  integer  n  >  0,  such  that  the  real  vector 
space  V  has  a  basis  with  n  elements,  we  say  that  V  has  dimension  n ,  and  write 
dim  V  =  n.  If  V  is  not  finitely  generated  we  set  dim  V  =  oo.  If  V  =  {0y}  we  set 
dim  V  =  0. 


Exercise  2.5.3  Following  what  we  have  extensively  described  above,  it  is  clear  that 
dim  Vq  =  2  and  dim  =  3.  Also  dim  M77  =  n,  with  dim  R  =  1,  and  we  have  that 
dimM[v]  =  oo  while  dimM[v]r  =  r  +  1.  Referring  to  the  Exercise  2.4.14,  one  has 
that  dime  C77  =  n  while  dim^  C77  =  In. 

We  omit  the  proof  of  the  following  results. 

Proposition  2.5.4  Let  V  be  a  n-dimensional  vector  space,  and  W  a  vector  sub  space 
ofV.  Then,  dim(W)  <  n,  while  dim(W)  =  n  if  and  only  ifW  =  V. 

Corollary  2.5.5  Let  V  be  a  n-dimensional  vector  space,  and  v\,  ...  ,vn  e  V.  The 
following  facts  are  equivalent: 

(i)  the  system  (v\,  ... ,  vn)  is  a  basis  for  V, 

(ii)  the  system  {v\,  . . . ,  vn}  is  free, 

(iii)  the  system  {v\,  . . . ,  vn}  generates  V. 

Theorem  2.5.6  (Grassmann)  Let  V  a  finite  dimensional  vector  space,  with  U  and 
W  two  vector  sub  spaces  of  V.  It  holds  that 

dim (U  +  W)  =  dim (U)  +  dim(W)  -  dim (U  fl  W). 

Proof  Denote  r  =  dim  (I/),  s  =  dim(W)  and  p  =  dim  (U  H  W).  We  need  to  show 
that  U  T  W  has  a  basis  with  r  -\-  s  —  p  elements. 

Let  (vi,  ... ,  vp)  be  a  basis  for  U  n  W .  By  the  Theorem  2.4.19  such  a  free  sys¬ 
tem  can  be  completed  to  a  basis  (v\, . . . ,  vp,  u\, . . . ,  ur-p)  for  U  and  to  a  basis 
(vu  . . . ,  Vp,  Wu  . . . ,  ws-p)  for  W. 

We  then  show  that  I  =  (v\, ...,  vp,  u\,  ... ,  ur-p,  w ,  ws-p)  is  a  basis  for 
the  vector  space  U  -\-W .  Since  any  vector  in  U  +  W  has  the  form  u  +  w,  with  u  e  U 
and  w  e  W,  and  since  u  is  a  linear  combination  ofv\,...,vp,u\,...,  ur_p,  while  w 
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is  a  linear  combination  ofv\,...,vp,w\,...,  ws-p ,  the  system  I  generates  U  +  W . 
Next,  consider  the  combination 


OL\V\  +  •  *  •  +  OLpVp  +  (3\U\  +  •  •  •  +  Pr-pUr-p  +  71^1  +  '  '  '  +  7 s-pws-p  ~  Oy  • 

Denoting  for  brevity  v  =  YHi= 1  ai  vi  > u  =  Y^j= 1  Pjuj  andu;  =  Ylk=i  ,  we  write 
this  equality  as 

v  +  u  +  w  =  Oy , 

with  veUnW,ueU,weW.  Since  v,  u  e  U,  then  w  =  —  v  —  u  e  U;  so  w  e 
U  HW.  This  implies 

w  —  71  ix;  i  +  •  •  •  +  7 s-pWs-p  =  AiUi  +  •  •  •  +  ^pvp 

for  suitable  scalars  \\  in  fact  we  know  that  {v\, . . . ,  vp,  w\, . . . ,  ws-p}  is  a  free 
system,  so  any  7^  must  be  zero.  We  need  then  to  prove  that,  from 

aivi  +  •  •  •  +  OipVp  +  f3\U\  +  •  •  •  +  (3r-pUr-p  =  Oy 

it  follows  that  all  the  coefficients  at  and  (3j  are  zero.  This  is  true,  since  (v\,  . . . ,  vp, 
Mi,  . . . ,  ur-p)  is  a  basis  for  U.  Thus  I  is  a  free  system.  □ 

Corollary  2.5.7  Let  W\  and  W2  be  vector  subspaces  of  V.  If  W\  ®  W2  can  be 
defined,  then 

dim(Wi  0  W2)  =  dim  (WO  +  dim(W2). 

Also,  ifB\  =  (w\ ,  . . . ,  w's)  and  B2  =  . . . ,  w")  are  basis  for  W\  and  W2  respec¬ 

tively,  a  basis  for  W\  0  W2  is  given  by  B  =  (w\ ,  . . . ,  w's,  wfi  . . . ,  w"). 

Proof  By  the  Grassmann  theorem,  one  has 

dim  (Wi  +  W2)  +  dim(Wi  (T  W2)  =  dim  (WO  +  dim(W2) 

and  from  the  Definition  2.2.12  we  also  have  dim(Wi  H  W2)  =  0,  which  gives  the 
first  claim. 

With  the  basis  B\  and  B2  one  considers  B  =  B\  U  B2  which  obviously  generates 
Wi  0  W2.  The  second  claim  directly  follows  from  the  Corollary  2.5.5.  □ 

The  following  proposition  is  a  direct  generalization. 

Proposition  2.5.8  Let  W\ ,  . . . ,  Wn  be  subspaces  of  a  real  vector  space  V  and  let 
the  direct  sum  W\  0  •  •  •  0  Wn  be  defined.  One  has  that 


dim(Wi  0  •  •  •  0  Wn)  =  dim(W0  +  •  •  •  +  dim(W„). 


Chapter  3 

Euclidean  Vector  Spaces 


® 

Check  for 
updates 


When  dealing  with  vectors  of  V0  in  Chap.  1,  we  have  somehow  implicitly  used  the 
notions  of  length  for  a  vector  and  of  orthogonality  of  vectors  as  well  as  amplitude  of 
plane  angle  between  vectors.  In  order  to  generalise  all  of  this,  in  the  present  chapter 
we  introduce  the  structure  of  scalar  product  for  any  vector  space,  thus  coming  to  the 
notion  of  euclidean  vector  space.  A  scalar  product  allows  one  to  speak,  among  other 
things,  of  orthogonality  of  vectors  or  of  the  length  of  a  vector  in  an  arbitrary  vector 
space. 


3.1  Scalar  Product,  Norm 

We  start  by  recalling,  through  an  example,  how  the  vector  space  M3  can  be  endowed 
with  a  euclidean  scalar  product  using  the  usual  scalar  product  in  the  space  Vq  . 

Example  3.1.1  The  usual  scalar  product  in  Vq,  under  the  isomorphism  M3  ^  V30 
(see  the  Proposition  1.3.9),  induces  a  map 

•  :  I3  x  R3  — >  R 

defined  as 

(xux2,x3)  •  (yi,y2,  y3)  =  *iyi  +  *2^2  +  *3^3. 

For  vectors  (x\,X2,xf),  (yi,  y2,  >>3),  (z\,  z2,  Z3)  €  M3  and  scalars  a,  b  e  R,  the  fol¬ 
lowing  properties  are  easy  to  verify. 
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(i)  Symmetry,  that  is: 

(xux2,x3)  •  ( yi,y2,};3 )  =  *i;yi  +  x2y2  +  -LD3 

=  y\xi  +  y2v2  +  y3x3  =  (yi,  y2,  y3)  •  (xux2,x3). 


(ii)  Linearity,  that  is: 

(a(x\,  x2,  x3)  +  b(y\,  y2,  y3))  •  (zi,z2,z3) 

=  (ax  1  +  by\)z\  +  ( ax2  +  by2)z2  +  (<3*3  +  by3)z3 
=  a(x\z\  +  *2^2  +  X3z3)  +  b(y\z\  +  y2^2  +  by3z3) 

=  a(x \,x2,x3)  •  (z\,z2,  z3)  +  b(yi,y2,  y3)  •  (z\,z2,z3). 

(iii)  Non  negativity,  that  is: 


Oi,  x2,  x3)  •  (xi,x2,x3)  =  Xy  +  x\  +  x\  >  0. 


(iv)  Non  degeneracy,  that  is: 


(*1,  *2,  *3)  •  (*1,  x2,  x3)  =  0  (*1,  *2,  -^3)  =  (0,  0,  0). 

These  last  two  properties  are  summarised  by  saying  that  the  scalar  product  in  M3 
is  positive  definite. 

The  above  properties  suggest  the  following  definition. 

Definition  3.1.2  Let  V  be  a  finite  dimensional  real  vector  space.  A  scalar  product 
on  V  is  a  map 

•  :  V  x  V  — >  R  (v,w)\-^v-w 

that  fulfils  the  following  properties.  For  any  v,  w,  v\ ,  v2  e  V  and  a\ ,  a2  e  R  it  holds 
that: 

(i)  v  *  w  =  w  *  v, 

(ii)  (aiui  +  a2v2)  -w  =  a\(y\-  w)  a2(v2  •  w), 

(iii)  v  •  n  >  0, 

(iv)  v  •  v  =  0  n  =  0y. 

A  finite  dimensional  real  vector  space  V  equipped  with  a  scalar  product  will  be 
denoted  (V,  •)  and  will  be  referred  to  as  a  euclidean  vector  space. 

Clearly  the  properties  (i)  and  (ii)  in  the  previous  definition  allows  one  to  prove 
that  the  scalar  product  map  •  is  linear  also  with  respect  to  the  second  argument. 
A  scalar  product  is  then  a  suitable  bilinear  symmetric  map ,  also  called  a  bilinear 
symmetric  real  form  since  its  range  is  in  R. 

Exercise  3.1.3  It  is  clear  that  the  scalar  product  considered  in  Vq  satisfies  the  condi¬ 
tions  given  in  the  Definition  3.1.2.  The  map  in  the  Example  3. 1. 1  is  a  scalar  product  on 
the  vector  space  M3.  This  scalar  product  is  not  unique.  Indeed,  consider  for  instance 
p  :  M3  x  M3  — >  R  given  by 


3.1  Scalar  Product,  Norm 
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p((x  1,  *2,  *3),  (yi,  yi,  ys))  =  2xiyi  +  3v2y2  +  %3 y?) • 

It  is  easy  to  verify  that  such  a  map  p  is  bilinear  and  symmetric.  With  v  =  (v\ ,  u2,  vf), 
from  p(v,  v )  =  2u3  +  3uf  +  uf  one  has  p(v ,  u)  >  0  and  p(v,  v)  =  0  u  =  0.  We 
have  then  that  p  is  a  scalar  product  on  M3 . 

Definition  3.1.4  On  M77  there  is  a  canonical  scalar  product 

•  :  M77  x  M77  — >  R 


defined  by 

n 

(x\,...,xn)-(j\,...,yn)=  a  1  v  1  H - =  Yxjyj. 

j= 1 

The  datum  (M77 ,  •)  is  referred  to  as  the  canonical  euclidean  space  and  denoted  En . 

The  following  lines  sketch  the  proof  that  the  above  map  satisfies  the  conditions 
of  Definition  3.1.2. 

En 

Xj-yj 

En 

yjxj  =  (yu  •  •  • ,  yn)  •  (*1,  •  •  • ,  xn), 

(ii)  left  to  the  reader, 

(iii)  (xu  . . . ,  xn)  •  (x\,  ...,xn)  =  Yll=  1  -L2  > 

(iv)  (vi,  (*1,  . . . ,  *„)  =  0  Ya=\  xf  =  0  O  xi  =  ^ 

(Tl  ?  .  .  .  ,  Xfi)  •  •  •  5  0)  • 

Definition  3.1.5  Let  (V,  •)  be  a  finite  dimensional  euclidean  vector  space.  The  map 

||  —  ||  :  V  — >  R,  v  i->  ||u||  =  > Jv  •  v 

is  called  norm.  For  any  v  e  V,  the  real  number  \\v\\  is  the  norm  or  the  length  of  the 
vector  v. 

Exercise  3.1.6  The  norm  of  a  vector  v  =  (x\,  ... ,  xn)  in  En  =  (M77,  •)  is 


In  particular,  for  E3  one  has  ||  (x\ ,  v2,  xf)  ||  =  y  v3  +  xf  +  . 

The  proof  of  the  following  proposition  is  immediate. 

Proposition  3.1.7  Eet  (V,  •)  be  a  finite  dimensional  euclidean  vector  space.  For  any 
v  e  V  and  any  a  e  R,  the  following  properties  hold: 
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(1)  N|>o, 

(2)  ||t;||  =  0  v  =  Oy, 

(3)  \\av\\  =  \a\  ||u||. 

Proposition  3.1.8  Let  (V,  •)  be  a  finite  dimensional  euclidean  vector  space.  For  any 
v,  w  e  V  the  following  inequality  holds: 

\v  •  w\  <  Hull  II tull. 


This  is  called  the  Schwarz  inequality. 

Proof  If  either  v  =  Oy  or  w  =  Oy  the  claim  is  obvious,  so  we  may  assume  that  both 
vectors  u,  w  ^Oy.Seta  =  \\w\\  andb  =  ||u||;  from  (iii)  in  the  Definition  3.1.2,  one 
can  write 


r\ 

0  <  || av  zb  bw ||  =  ( av  ±  bw)  •  ( av  ±  bw) 

=  ^2||n||2  ±  2 ab(v  •  w)  +  Z?2||u;||2 
=  2aZ?(||i;||  || ip ||  =b  v  •  w). 

Since  both  a,  b  are  real  positive  scalars,  the  above  expression  reads 

=F  v  •  w  <  ||  v  ||  ||  w  || 


which  is  the  claim.  □ 

Definition  3.1.9  The  Schwarz  inequality  can  be  written  in  the  form 


v  •  w 


v\\  w 


<  1, 


that  is 


-  1  < 


v  •  w 


v\\  \\w 


<  1. 


Then  one  can  define  then  angle  a  between  the  vectors  v,  w,  by  requiring  that 


v  •  w 

- =  cos  a 

||i;||  ||u;|| 

with  0  <  a  <  7T.  Notice  the  analogy  between  such  a  definition  and  the  one  in  Defi¬ 
nition  (1.3.2)  for  the  geometric  vectors  in  Vq. 

Proposition  3.1.10  Let  (V,  •)  be  a  finite  dimensional  euclidean  vector  space.  For 
any  v,  w  e  V  the  following  inequality  holds: 

|| i;  +  iu||  <  ||u||  +  || iu || . 


This  is  called  the  triangle,  or  Minkowski  inequality. 
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Proof  From  the  definition  of  the  norm  and  the  Schwarz  inequality  in  Proposi¬ 
tion  3.1.8,  one  has  v  •  w  <  \v  •  w\  <  ||u||  ||u;||.  The  following  relations  are  imme¬ 
diate, 


V  +  w  ||  =  (v  +  w)  •  (v  +  w) 

=  ||n||2  +  2{v  •  w)  +  ||u;||2 
<  ||n||2  +  2 1| i; ||  \\w\\  +  \\w 

=  (INI  +  IMI)2 


and  prove  the  claim. 


□ 


3.2  Orthogonality 

As  mentioned,  with  a  scalar  product  one  generalises  the  notion  of  orthogonality 
between  vectors  and  then  between  vector  subspaces. 

Definition  3.2.1  Let  (V,  •)  be  a  finite  dimensional  euclidean  vector  space.  Two  vec¬ 
tors  v,  w  e  V  are  said  to  be  orthogonal  if  v  •  w  =  0. 

Proposition  3.2.2  Let  (V,  •)  be  a  finite  dimensional  euclidean  vector  space,  and  let 
w  i ,  •  •  •  ,  ws  and  v  be  vectors  in  V.  If  v  is  orthogonal  to  each  Wi,  then  v  is  orthogonal 
to  any  vector  in  the  linear  span  C(w\,  . . . ,  ws). 

Proof  From  the  bilinearity  of  the  scalar  product,  one  has 


v  •  (Ai w\  H - b  A sws)  =  AiO  •  wi)  H - b  Xs(v  •  ws). 


The  right  hand  side  of  such  expression  is  obviously  zero  under  the  hypothesis  of 
orthogonality,  that  is  v  •  u>i  =  0  for  any  i .  □ 

Proposition  3.2.3  Let  (V,  •)  be  a  finite  dimensional  euclidean  vector  space.  If 
v\, . . . ,  vs  is  a  collection  of  non  zero  vectors  which  are  mutually  orthogonal,  that  is 
Vi  •  vj  =  0  for  i  7b  jf  then  the  vectors  v\,  . . . ,  vs  are  linearly  independent. 

Proof  Let  us  equate  to  the  zero  vector  a  linear  combination  of  the  vectors  v\ ,  . . . ,  vs, 
that  is,  let 

A1U1  +  •  •  •  +  Xsvs  =  0y. 


For  Vi  g  {iq,  . . . ,  vs },  we  have 


0  —  Vi  •  (Aiiq  +  •  •  •  +  Xsvs)  —  X\(vi  •  v\)  +  •  •  •  +  A s(vi  •  vs )  —  A /  ||u/ 


Being  Vi  7b  0y  it  must  be  A /  =  0.  One  gets  Ai  =  . 
argument  for  any  vector  vt . 


.  =  As  =  0,  with  the  same 

□ 
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Definition  3.2.4  Let  (V,  •)  be  a  finite  dimensional  euclidean  vector  space.  If  W  c  V 
is  a  vector  subspace  of  V,  then  the  set 


w1  =  {v  eV  :  s  v  ■  w  =  0,  Vw  e  W} 
is  called  the  orthogonal  complement  to  W. 

Proposition  3.2.5  Let  W  c  V  be  a  vector  subspace  of  a  euclidean  vector  space 
( V ,  •)•  Then, 

( i )  W1-  is  a  vector  subspace  of  V, 

(ii)  W  D  W1-  =  {0y},  and  the  sum  between  W  and  W1-  is  direct. 

Proof  (i)  Let  V\,V2  e  WL,  that  is  v\  •  w  =  0  and  V2  •  w  =  0  for  any  w  e  W.  With 
arbitrary  scalars  Ai,  A2  €  M,  one  has 


(A1U1  +  X2V2)  •  w  =  Ai(ui  •  w)  +  A2O2  -w)  =  0 


for  any  w  e  W;  thus  Aii;i  +  A2n2  e  WL.  The  claim  follows  by  recalling  the 
Proposition  2.2.2. 

(ii)  If  w  e  W  H  WL,  then  w  •  w  =  0,  which  then  gives  w  =  0y.  □ 

Remark  3.2.6  Let  W  =  C(w\,  . . . ,  ws)  C  V.  One  has 

W±  =  {v  e  V  |  v  •  Wi  =  0,  V/  =  1, . . . ,  s}. 

The  inclusion  WL  c  C{w\,  . . . ,  vs)  is  obvious,  while  the  opposite  inclusion 
C(w  1,  . . . ,  u;5)  c  W1-  follows  from  the  Proposition  3.2.2. 

Exercise  3.2.7  Consider  the  vector  subspace  W  =  £((1,0,  1))  c  E3.  From  the  pre¬ 
vious  remark  we  have 

w x  =  { (x ,  y,  z)  €  E3  I  (x,  y,  z)  ■  (1,0,  1)  =  0}  =  {(x,  y,  z)  €  E3  \  x  +  z  =  0}, 

that  is  W 1  =  £((1,0,  -1),  (0,  1,0)). 

Exercise  3.2.8  Let  W  c  E4  be  defined  by 

W  =  £((1,-1, 1,0),  (2, 1,0,1)). 


By  recalling  the  Proposition  3.2.3  and  the  Corollary  2.5.7  we  know  that  the 
orthogonal  subspace  W1-  has  dimension  2.  From  the  Remark  3.2.6,  it  is  given  by 


W 


_L 


•  (x,  y,  Z,  t)  e  E4 


(x,  y,  z,  t)  ■  (1,  -1,  1, 0)  =  0 
(x,y,z,t)  ■  (2,  1,0,  1)  =  0 


that  is  by  the  common  solutions  of  the  following  linear  equations, 
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X  -  y  +  z  =  0 
2x  y  1  —  0  . 
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Such  solutions  can  be  written  as 


z  =  y  -x 

< 

t  =  —2x  —  y 

for  arbitrary  values  of  x,  y.  By  choosing,  for  example,  (x,  y)  =  (1,0)  and  (x,  y)  = 
(0,  1),  for  the  orthogonal  subspace  W 1  one  can  show  that  WL  =  £((1,0,  —1,  —2), 
(0,  1,  1,  —1))  (this  kind  of  examples  and  exercises  will  be  clearer  after  studying 
homogeneous  linear  systems  of  equations). 


3.3  Orthonormal  Basis 


We  have  seen  in  Chap.  2  that  the  orthogonal  cartesian  coordinate  system  (O,  i,  j,  k) 
for  the  vector  space  V30  can  be  seen  as  having  a  basis  whose  vectors  are  mutually 
orthogonal  and  have  norm  one. 

In  this  section  we  analyse  how  to  select  in  a  finite  dimensional  euclidean  vector 
space  (V,  •),  a  basis  whose  vectors  are  mutually  orthogonal  and  have  norm  one. 

Definition  3.3.1  Let  I  =  {v± , . . . ,  vr}  be  a  system  of  vectors  of  a  vector  space  V.  If 
V  is  endowed  with  a  scalar  product,  I  is  called  orthonormal  if 


Vi  ■  Vj  =  Su  = 


1  if  i  =  j 
0  if  i  £  j  • 


Remark  3.3.2  From  the  Proposition  3.2.3  one  has  that  any  orthonormal  system  of 
vectors  if  free,  that  is  its  vectors  are  linearly  independent. 

Definition  3.3.3  A  basis  B  for  (V,  •)  is  called  orthonormal  if  it  is  an  orthonormal 
system. 

By  such  a  definition,  the  basis  (i,  j,  k)  of  as  well  as  the  canonical  basis  for  En 
are  orthonormal. 


Remark  3.3.4  Let  B  =  (e\ ,  . . . ,  en)  be  an  orthonormal  basis  for  (V,  •)  and  let  v  e  V. 
The  vector  v  can  be  written  with  respect  to  B  as 


v  —  (v  •  e\)e\  +  •  •  •  +  (v  •  en)en. 


Indeed,  from 


v  —  a\e\  T  •  •  •  T  anen 
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one  can  consider  the  scalar  products  of  v  with  each  et ,  and  the  orthogonality  of  these 
yields 

d\  =  V  '  ...  ,  CLn  —  V  '  6n. 

Thus  the  components  of  a  vector  with  respect  to  an  orthonormal  basis  are  given  by 
the  scalar  products  of  the  vector  with  the  corresponding  basis  elements. 

Definition  3.3.5  Let  B  =  (e  ,  en)  be  an  orthonormal  basis  for  (V,  •)•  With 
v  g  V,  the  vectors 

(v  •  C\)C\,  ...  ,  (v  €n^6n, 


which  give  a  linear  decomposition  of  v,  are  called  the  orthogonal  projections  of  v 
along  ei,  . . . ,  en. 

The  next  proposition  shows  that  in  an  any  finite  dimensional  real  vector  space 
(V,  •),  with  respect  to  an  orthonormal  basis  for  V  the  scalar  product  has  the  same 
form  than  the  canonical  scalar  product  in  En . 

Proposition  3.3.6  Let  B  =  (e ,  en)  be  an  orthonormal  basis  for  (V,  •)•  With 
v,  w  g  V ,  let  it  be  v  =  (a\,  . . . ,  an)s  and  w  =  (b\,  ,  bn)&.  Then  one  has 

v  •  w  =  a\b\  +  •  •  •  +  anbn. 


Proof  This  follows  by  using  the  bilinearity  of  the  scalar  product  and  the  relations 

£?/  e j  —  §i j .  EH 

Any  finite  dimensional  real  vector  space  can  be  shown  to  admit  an  orthonormal 
basis.  This  is  done  via  the  so  called  Gram- Schmidt  orthonormalisation  method.  Its 
proof  is  constructive  since,  out  of  any  given  basis,  the  method  provides  an  explicit 
orthonormal  basis  via  linear  algebra  computations. 

Proposition  3.3.7  (Gram-Schmidt  method)  Let  B  =  (v\ ,  ,vn)  be  a  basis  for  the 
finite  dimensional  euclidean  space  (V,  •)•  The  vectors 


e\ 

?2 


V\ 

Vl\\ 

v2  -  (y2  •  ei)ei 
v2  -  (v2  •  ei)ei\\ 


Vn  ~  Y!lJ\(Vn  '  gj)gj 

Vn  -  YliZl  (Vn  •  efiei 


form  an  orthonormal  basis  (e\,  ... ,  en)  for  V. 
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Proof  We  start  by  noticing  that  \\ej  ||  =  1,  for  j  =  1,  . . . ,  n,  from  the  way  these 
vectors  are  defined.  The  proof  of  orthogonality  is  done  by  induction.  As  induction 
basis  we  prove  explicitly  that  e\  •  £2  =0-  Being  e\  •  e\  =  1,  one  has 

e\  -  V2  —  (V2  •  g)g  •  G 
e\  •  £2  =  -  =  0  . 

\\v\ ||  || u2  -  O2  •  ei)ei\\ 

We  then  assume  that  e\,  . . . ,  ej2  are  pairwise  orthogonal  (this  is  the  inductive 
hypothesis)  and  show  that  e\,  . . en+i  are  pairwise  orthogonal.  Consider  an  integer 
k  such  that  1  <  k  <  h.  Then, 


Gz+ 1  •  ek  = 


Vh+ 1  y^/— 1  ’  €i)&i 

/z  * 

ll^+i  -  E* =1(^+1  •  e()ei  II 

t>/,+i  •  -  ElU  (V/i+i  -  •  efr)) 

lk/i+1  -  Ef=l  (U/J+1  •  ei)ei  II 


f/z  +  1  •  ^  —  Vh+ 1  •  ^ 

-  Eti  (u/1+1  •  ei)ei 


where  the  last  equality  follows  from  the  inductive  hypothesis  et  •  =  0.  The  system 

(ei,  . . . ,  £„)  is  free  by  Remark  3.3.2,  thus  giving  an  orthonormal  basis  for  V.  □ 


Exercise  3.3.8  LetV  =  22(i;i ,  ^2)  C  E4,withi»i  =  (1,  1,0,  0),andr»2  =  (0,2,  1,  1). 
With  the  Gram-Schmidt  orthogonalization  method,  we  obtain  an  orthonormal  basis 
for  V.  Firstly,  we  have 


e\  = 


1 


(1,1, 0,0). 


Set  f2  =  V2  —  (V2  •  e{)e\.  We  have  then 


=  (0,2,  1,1) -(1,1,  0,0) 
=  (-1,1,  1,1). 


/2  =  (0,2,  1,1)-  (0,2,  1,1) 


1 


V2 


(1,1, 0,0) 


Then,  the  second  vector  £2  =  is 

z  II/2 II 

^2=|  (—1,  1,  1,  1)- 

Theorem  3.3.9  Any  finite  dimensional  euclidean  vector  space  (V,  •)  admits  an 
orthonormal  basis. 


Proof  Since  V  is  finite  dimensional,  by  the  Corollary  2.4.4  it  has  a  basis,  which  can 
be  orthonormalised  using  the  Gram-Schmidt  method.  □ 
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Theorem  3.3.10  Let  (V,  •)  be  finite  dimensional  with  {e\,  .  .  .  ,  erj  an  orthonor¬ 
mal  system  of  vectors  of  V.  The  system  can  be  completed  to  an  orthonormal  basis 
(e\ ,  . . . ,  er ,  erjr\ ,  . . . ,  efi)  for  V . 

Proof  From  the  Theorem  2.4.19  the  free  system  {e\,  ...  ,er]  can  be  completed  to  a 
basis  for  V,  say 

13  —  (e\,  . . . ,  er ,  . . . ,  Vn)> 

The  Gram-Schmidt  method  for  the  system  B  does  not  alter  the  first  r  vectors,  and 
provides  an  orthonormal  basis  for  V.  □ 

Corollary  3.3.11  Let  (V,  •)  have  finite  dimension  n  and  let  W  be  a  vector  subspace 
ofV.  Then, 

(1)  dim(W)  +  dimiW1)  =  n, 

(2)  V  =  W  © 

(3)  (W-1)1  =  W. 

Proof 

(1)  Let  (e\,  ...  ,er)  be  an  orthonormal  basis  for  W  completed  (by  the  theorem 
above)  to  an  orthonormal  basis  (e\,  ...  ,er,  er+\, . . . ,  en)  for  V.  Since  the  vec¬ 
tors  er+\ ,  . . . ,  en  are  then  orthogonal  to  the  vectors  e\ ,  . . . ,  er ,  they  are  (see  the 
Definition  3.2.1)  orthogonal  to  any  vector  in  IT,  so  er+\,  . . . ,  en  e  W1.  This 
gives  dim(lT±)  >  n  —  r,  that  is  dim  (IT)  +  dim(lT±)  >  n.  From  the  Defini¬ 
tion  3.2.4  the  sum  of  IT  and  WL  is  direct,  so,  recalling  the  Corollary  2.5.7, 
one  has  dim(lT)  +  dim(lT±)  =  dim(lT  0  IT^)  <  n ,  thus  proving  the  claim. 

(2)  From  (1)  we  have  dim  (IT  0  IT^)  =  dim(lT)  +  dim(lT±)  =  n  =  dim(T);thus 
W  0  WL  =  V. 

(3)  We  start  by  proving  the  inclusion  (IT^)1  IT. 

By  definition,  it  is  (IT-1)-1  =  {v  e  V  \  v  •  w  =  0,  Ww  e  W1}.  If  v  e  W,  then 
v  •  w  =  0  for  any  w  e  WL,  thus  IT  c  (1T±)±.  Apply  now  the  result  in  point 
1)  to  W1:  one  has 

dim(lT±)  +  dim((lT±)±)  =  n. 

This  inequality,  together  with  the  point  1)  gives  dim((lT-L)±)  =  dim (W);  the 
spaces  IT  and  (W±)±  are  each  other  subspace  with  the  same  dimension,  thus 
they  coincide.  □ 

It  is  worth  stressing  that  for  the  identity  (W±)±  =  IT  it  is  crucial  that  the  vector 
space  V  be  finite  dimensional.  For  infinite  dimensional  vector  spaces  in  general  only 
the  inclusion  (IT^)^  IT  holds. 

Exercise  3.3.12  In  Exercise  3.2.7  we  considered  the  subspace  of  E 3  given  by  IT  = 
£((  1,  0,  1)),  and  computed  ITX  =  1,  0,  —1),  (0,  1,  0)).  It  is  immediate  to  verify 

that 

dim(IT)  +  dim(IT±)  =  1+2  =  3  =  dim(£3). 


3.4  Hermitian  Products 
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3.4  Hermitian  Products 

The  canonical  scalar  product  in  Rn  can  be  naturally  extended  to  the  complex  vector 
space  Cn  with  a  minor  modification. 

Definition  3.4.1  The  canonical  hermitian  product  on  Cn  is  the  map 


•  :  Cn  x  Cn  — ►  C 


defined  by 


(Z\,  .  •  •  ,  Zn)  •  Ol,  .  •  •  ,  Wn)  =  Z\Wi  H - h  ZnWn 


where  z  denotes  the  complex  conjugate  of  z  (see  the  Sect.  A. 5).  The  datum  ( Cn ,  •)  is 
called  the  canonical  hermitian  vector  space  of  dimension  n. 

The  following  proposition — whose  straightforward  proof  we  omit — generalises 
to  the  complex  case  the  properties  of  the  canonical  scalar  product  on  Rn  shown 
after  Definition  3.1.4.  For  easy  of  notation,  we  shall  denote  the  vectors  in  Cn  by 

Z  (z  1  j  •  •  •  ?  Zn) • 

Proposition  3.4.2  For  any  z,  w,  v  e  Cn  and  a,  b  e  C,  the  following  properties  hold: 

(i)  w  •  z  =  z  '  w  , 

(ii)  ( az  +  bw)  •  v  =  a(z  •  u)  +  b(w  •  v) 

while  v  •  (az  +  bw)  =  a(v  •  z)  +  b(v  •  w) , 


Notice  that  the  complex  conjugation  for  the  first  entry  of  the  hermitian  scalar  prod¬ 
uct  implies  that  the  hermitian  product  of  a  vector  with  itself  is  a  real  positive  number. 
It  is  this  number  that  gives  the  real  norm  of  a  complex  vector  z  =  (z\,  . . . ,  zn), 
defined  as 


n 


Chapter  4 

Matrices 


® 

Check  for 
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Matrices  are  an  important  tool  when  dealing  with  many  problems,  notably  the  theory 
of  systems  of  linear  equations  and  the  study  of  maps  (operators)  between  vector 
spaces.  This  chapter  is  devoted  to  their  basic  notions  and  properties. 


4.1  Basic  Notions 


Definition  4.1.1  A  matrix  M  with  entries  in  R  (or  a  real  matrix )  is  a  collection 
of  elements  aij  e  R,  with  i  =  1 , . . . ,  m ;  j  =  1 , . . . ,  n  and  m ,  n  e  N,  displayed  as 
follows 


M  = 


a  ii 
02 1 


a\2  . . .  a \n  \ 

a22  •  .  •  Cl2n 


\  &m  1  &m2  • 


a 


mn 


/ 


The  matrix  M  above  is  said  to  be  made  of  m  row  vectors  in  R” ,  that  is 


or  by  n  column  vectors  in  Mm ,  that  is 


C\  =  (an,  •  •  • ,  0mi) 


C 


Thus  the  matrix  M  above  is  aw  x  n -matrix  (m  rows  Ri  e  R”  and  n  columns 
7?/  e  W1).  As  a  shorthand,  by  M  =  (a^)  we  shall  denote  a  matrix  M  with  entry 
atj  at  the  / -th  row  and  j-th  column.  We  denote  by  Rm,n  the  collection  of  all 
m  x  ^-matrices  whose  entries  are  in  R. 
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Remark  4.1.2  It  is  sometime  useful  to  consider  a  matrix  M  g  Wn,n  as  the  collection 
of  its  n  columns,  or  as  the  collection  of  its  m  rows,  that  is  to  write 


M  =  (C i  C2  ...  C„)  = 


V  Rm  } 


An  element  M  e  R1,n  is  called  a  n-dimensional  row  matrix, 


M  =  (an  an  ...  aln) 


while  an  element  M 


G  M777,1  is  called  a  ftz -dimensional  column  matrix , 


M  = 


All  \ 
<*21 

\  1  / 


A  square  matrices  of  order  n  is  a  n  x  n  matrix,  that  is  an  element  in  W 1,n .  An  element 
M  g  R1,1  is  a  scalar,  that  is  a  single  element  in  R.  If  A  =  (aq)  G  W l,n  is  a  square 
matrix,  the  entries  (an,  an,  . . . ,  give  the  (principal )  diagonal  of  A. 

Example  4.1.3  The  bold  typeset  entries  in 

1  22\ 

-10  3 
2  4  7/ 


give  the  diagonal  of  A. 

Proposition  4.1.4  The  set  Rm,n  is  a  real  vector  space  whose  dimension  is  mn.  With 
A  =  (aij),  B  =  (bij)  G  M/M,w  and  A  G  R,  the  vector  space  operations  are  defined  by 


A  +  B  —  (aij  +  b^) ,  A  A  —  (A  aij). 


Proof  We  task  the  reader  to  show  that  Mm,n  equipped  with  the  above  defined  oper¬ 
ations  is  a  vector  space.  We  only  remark  that  the  zero  element  in  Wn,n  is  given  by 
the  matrix  A  with  entries  aq  =  Or;  such  a  matrix  is  also  called  the  null  matrix  and 
denoted  0r™,«  . 

In  order  to  show  that  the  dimension  of  Mm,w  is  mn ,  consider  the  elementary 
m  x  ft -matrices 


(rs) 

jk 


with 


1  if  O',  k)  =  (r,  s) 
0  if  k)  ^  (r ,  s ) 


4.1  Basic  Notions 
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Thus  the  matrix  Ers  has  entries  all  zero  but  for  the  entry  (r,  s)  which  is  1.  Clearly 
there  are  mn  of  them  and  it  is  immediate  to  show  that  they  form  a  basis  for  Rm,n .  □ 


Exercise  4.1.5  Let  A  = 


(i  2  -1 
\0  -1  1 


e  M2,3.  One  computes 


(i  2  -1 
\0  -1  1 


(\  00\  /0  1  0\  _  /00  l\ 

\ooo)+2\ooo)  L00/ 


+  o 


000\ 

1  0  oj 


( 000\  /000\ 

{oio)  +  \ooi) 


=  Ell  +  2Ei2  —  E 13  —  E22  +  E23. 


In  addition  to  forming  a  vector  space,  matrices  of  ‘matching  size’  can  be  multiplied. 

Definition  4.1.6  If  A  =  (ciij )  e  Mm,n  and  5  =  (Z?^)  e  M",/?  the  product  between  A 
and  5  is  the  matrix  in  Wn,p  defined  by 


C  =  (c/jk)  =  AB  e  Rm'p  ,  where  cik  =  R-A)  •  ^  ^ijbjk , 

7=1 


with  i  =  l, ...  ,m  and  ^  =  1,  . . . ,  p.  Here  7^(A)  •  c{5)  denotes  the  scalar  product  in 
R”  between  the  i- th  row  vector  R\A)  of  A  with  the  &-th  column  vector  C[B)  of  B. 

Remark  4.1.7  Notice  that  the  product  A  B — also  called  the  row  by  column  product — 
is  defined  only  if  the  number  of  columns  of  A  equals  the  number  of  rows  of  B. 


Exercise  4.1.8  Consider  the  matrices 


A=(l2~l 
A  \3  0  i 


(l2\ 

B  =  2  1  e  M3'2. 

\34  / 


One  has  AB  =  C  =  ( c ,•*)  e  R2,2  with 


On  the  other  hand,  BA  =  C'  =  (c'r)  €  R3,3  with 


Clearly,  comparing  C  with  C'  is  meaningless,  since  they  are  in  different  spaces. 
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Remark  4.1.9  With  A  e  Rm,n  and  B  e  Rp,q,  the  product  AB  is  defined  only  if 
n  =  p,  giving  a  matrix  AB  e  Wn,q .  It  is  clear  that  the  product  B  A  is  defined  only  if 
m  =  q  and  in  such  a  case  one  has  BA  e  Rp,n.  When  both  the  conditions  m  =  q  and 
n  =  p  are  satisfied  both  products  are  defined.  This  is  the  case  of  the  matrices  A  and 
B  in  the  previous  exercise. 

Let  us  consider  the  space  Rn,n  of  square  matrices  of  order  n.  If  A,  B  are  in  then 
both  AB  and  BA  are  square  matrices  of  order  n.  An  example  shows  that  in  general 
one  has  AB  ^  BA.  If 


one  computes  that 


AB 


3  -1 
0  -1 


#  BA 


Thus  the  product  of  matrices  is  non  commutative.  We  shall  say  that  two  matrices 
A,  B  g  Rn,n  commute  if  AB  =  BA.  On  the  other  hand,  the  associativity  of  the  prod¬ 
uct  in  R  and  its  distributivity  with  respect  to  the  sum,  allow  one  to  prove  analogous 
properties  for  the  space  of  matrices. 

Proposition  4.1.10  The  following  identities  hold: 

(i)  A(BC)  =  ( AB)C ,  for  any  A  g  Rm’n,  B  g  Rn'p,  C  g  Rp,q  , 

(ii)  A(B  +  C)  =  AB  +  AC,  for  any  A  g  Rm'n,  B,C  e  , 

(iii)  A  (AB)  =  (A  A)B  =  A(Afl),  for  any  A  g  Mm’n,  B  g  Rn’p,  A  g  R. 

Proof  (i)  Consider  three  matrices  A  =  (a^)  g  Rm,n,  Z?  =  (Jbhk)  £  R77,/?  and 
C  =  (c^)  g  RM.  From  the  definition  of  row  by  column  product  one  has 
AB  =  (dik)w\thdik  =  Yl=\  ®ihbhki while  BC  =  (ehj)w\thehj  =  Y.k=\  bhkCkj ■ 
The  i 7 -entries  of  ( AB)C  and  A(BC)  are 

P  Pin  \  p  n 

^  '  dikCkj  —  ^  ^  I  ^  '  dihbjik  J  d kj  ~  ^  ^  ^  '  ( Bih^hk^kjf 

k= 1  fc=l  \h= 1  /  fc=l  6=1 

«  «  /  p  \  n  p 

^  '  dih^hj  ~  ^  '  ^zTz  I  ^  '  bhkCkj  I  =  ^  ^  '  ( PihbhkCkjf 

h=  1  6=1  \&=1  /  /i=  1  6=1 


These  two  expressions  coincide  (the  last  equality  on  both  lines  follows  from 
the  distributivity  in  R  of  the  product  with  respect  to  the  sum). 

(ii)  Take  matrices  A  =  (an2)  e  Rm,n,  B  =  ( bhj )  e  M77,/?  and  C  =  (cjy)  g  Rn,p.  The 
equality  A(B  +  C)  =  A 5  +  AC  is  proven  again  by  a  direct  computation  of 
the  i  j  -entry  for  both  sides: 


4.1  Basic  Notions 
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[A(B  +  C)]y 


n 

^  h  (fih  j  “1“  Ch  j  ) 

h= 1 


J]  Mihbhj  +  J]  ^ih^hj 


h  =  \ 


h= 1 


=  [A5]i7  +  [AC]y 
=  [A5  +  AC],7. 


(iii)  This  is  immediate.  □ 

The  matrix  product  in  M77’77  is  inner  and  it  has  a  neutral  element,  a  multiplication  unit. 

Definition  4.1.11  The  unit  matrix  of  order  n ,  denoted  by  is  the  element  in  W 7,77 
given  by 


In  =  (Sij) ,  with  Sij  = 


1  if  i  =  j 

0  if  i+j' 


The  diagonal  entries  of  In  are  all  1,  while  its  off-diagonal  entries  are  all  zero. 
Remark  4.1.12  It  is  easy  to  prove  that,  with  A  e  Mm,w,  one  has 


AIn  =  A  and  lmA  =  A. 


Proposition  4.1.13  The  space  M72,n  of  square  matrices  of  order  n,  endowed  with  the 
sum  and  the  product  as  defined  above,  is  a  non  abelian  ring. 

Proof  Recall  the  definition  of  a  ring  given  in  A.  1.6.  The  matrix  product  is  an  inner 
operation  in  W l,n ,  so  the  claim  follows  from  the  fact  that  (M,n,n ,  +,  )  is  an  abelian 

group  and  the  results  of  the  Proposition  4.1.10.  □ 

Definition  4.1.14  A  matrix  A  e  W1^  is  said  to  be  invertible  (also  non-singular  or 
non- degenerate)  if  there  exists  a  matrix  B  e  M77’",  such  that  AB  =  BA  =  In.  Such 
a  matrix  B  is  denoted  by  A-1  and  is  called  the  inverse  of  A. 

Definition  4.1.15  If  a  matrix  is  non  invertible,  then  it  is  called  singular  or  degener¬ 
ate. 

Exercise  4.1.16  An  element  of  the  ring  M77’"  needs  not  be  invertible.  The  matrix 

A  = 

is  invertible,  with  inverse 

A" 


e  R 


2,2 


as  it  can  be  easily  checked.  On  the  other  hand,  the  matrix 


52 


4  Matrices 


A 


is  non  invertible,  for  any  value  of  the  parameter  k  g  R.  It  is  easy  to  verify  that  the 
matrix  equation 


has  no  solutions. 

Proposition  4.1.17  The  subset  of  invertible  elements  in  Wl,n  is  a  group  with  respect 
to  the  matrix  product.  It  is  called  the  general  linear  group  of  order  n  and  is  denoted 
by  GL (n,  R)  or  simply  by  GL (n). 

Proof  Recall  the  definition  of  a  group  in  A. 2. 7.  We  observe  first  that  if  A  and  B  are 
both  invertible  then  AB  is  invertible  since  AB(B~l  A-1)  =  (B~l  A~l)AB  =  In\  this 
means  that  ( AB)~{  =  B~{  A~l  so  GL (n)  is  closed  under  the  matrix  product.  It  is 
evident  that  If1  =  In  and  that  if  A  is  invertible,  then  A  is  the  inverse  of  A-1,  thus 
the  latter  is  invertible.  □ 

Notice  that  since  A B  is  in  general  different  from  BA  the  group  GL (n)  is  non  abelian. 
As  an  example,  the  non  commuting  matrices  A  and  B  considered  in  the  Remark  4.1.9 
are  both  invertible. 

Definition  4.1.18  Given  A  =  )  g  Mm,n  its  transpose ,  denoted  by  TA ,  is  the  matrix 

obtained  from  A  when  exchanging  its  rows  with  its  columns,  that  is r A  =  (bij )  g  M72,m 
with  bij  =  a p . 

Exercise  4.1.19  The  matrix 


has  transpose  1  A  g  M3,2  given  by 


*A  = 


Proposition  4.1.20  The  following  identities  hold : 


(i)  \A  +  B)  =  lA  +  fB,  for  any  A,  B  e  Wn,n, 

(ii)  \AB)  =  TBTA,for  A  e  Rm,n  and  B  e  M”,p, 

( iii)  if  A  g  GL  (n)  then  rA  g  GL  (n)  that  is,  if  A  is  invertible  its  transpose  is  invertible 


as  well  with  fA)  1  =  \A  !). 
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Proof  (i)  It  is  immediate. 

(ii)  Given  A  =  (aij )  and  B  =  (bjj),  denoted  =  (<z  ■  ■ )  and lB  =  (Z?  ■ . )  with a[-  =  aji 
and  b\j  =  bjt  .If  AB  =  (c/j),  then  =  Ylh= 1  aihf>hj  •  The  i  j  -element  in T  ( AB ) 

is  then  1  ajh^hi ;  the  i j -element  in  (6rA  is  Khahj  an<^  these  elements 

clearly  coincide,  for  any  i  and  j . 

(iii)  It  is  enough  to  show  that  r(A_1)  A  =  /„.  From  (ii)  one  has  indeed 

,(A-l),A  =  ,(AA-l)  =  ,In  =/„. 


This  finishes  the  proof. 


□ 


Definition  4.1.21  A  square  matrix  of  order  n,  A  =  ( )  e  M",n,  is  said  to  be  yym- 
metric  if  TA  =  A  that  is,  if  for  any  i,  j  it  holds  that  atj  =  ap . 


/  1  2  — 1 

Exercise  4.1.22  The  matrix  A  =  [  2  0  1 

V-ii-i 


is  symmetric. 


4.2  The  Rank  of  a  Matrix 

Definition  4.2.1  Let  A  =  (an)  be  a  matrix  in  Mw,n.  We  have  seen  that  the  m  rows 
of  A, 


^1  =  («11>  •  •  •  , 


are  elements  (vectors,  indeed)  in  W1.  By  R  (A)  we  denote  the  vector  subspace  of  W1 
generated  by  the  vectors  R\, ... ,  Rm  that  is, 

R(A)  =  C(R\, . . . ,  Rm). 


We  call  R (A)  the  row  space  of  A.  Analogously,  given  the  columns 


Ci  = 


an  \ 
an 


\  am  i  J 


r  — 

?  — 


an  ^ 
a2n 


\  amn  J 


of  A,  we  define  the  vector  subspace  C(A)  in  Mm, 


C (A)  =  C(C\, . . . ,  C„) 
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as  the  column  space  of  A. 

Remark  4.2.2  Clearly  CfA )  =  R(A)  since  the  columns  of  TA  are  the  rows  of  A. 

Theorem  4.2.3  Given  A  =  (atj)  e  Mm,n  one  has  that  dim(7?(A))  =  dim(C(A)). 

Proof  Since  A  is  fixed,  to  simplify  notations  we  set  R  =  R (A)  and  C  =  C(A). 
The  first  step  is  to  show  that  dim(C)  <  dim (R).  Let  dim  R  =  r  ;  up  to  a  permuta¬ 
tion,  we  can  take  the  first  r  rows  in  A  as  linearly  independent.  The  remaining  rows 
Rr+u  . . . ,  Rm  are  elements  in  R  =  jC(R\,  . . . ,  Rr )  and  we  can  write 


A  = 


/  R i 


Rr 


\ 


el  a r1* 


V  EE+r^  / 


Oil 


1 

'r  \  r+1 


\ 


arn 
r  \  r+1 


EU  AE+n  •  •  •  EU 


V  E;=i Ar«<i  •••  EUA^-n  / 


for  suitable  scalars  A/  (with/  e  1, . . . ,  r,  and  j  e  r  +  1, . . . ,  m).  Given/?  =  1 ,  ...  ,n. 
consider  the  h- th  column. 


/ 


Ch  = 


a\h 

aih 


\ 


arh 

EEi  ae1^ 


V  EEi  / 


1  \ 
o 


=  CL\h 


0 

r+1 


A 


V  AT  / 


0  \ 
o 


+  •  •  •  +  &rh 


1 

r  +  1 


A 


V  K  ) 


This  means  that  C  is  generated  by  the  r  columns 


1  1 

0  \ 

0 

0 

0 

?  *  *  *  ? 

i 

Aj+1 

A+1 

• 

* 

V  A'f  / 


V  K  J 


so  we  have  dim(C)  <  r  =  dim  R.  By  exchanging  the  rows  with  columns,  a  similar 
argument  shows  also  that  dim(C)  >  dim(7?)  thus  the  claim.  □ 

This  theorem  shows  that  dim(7?(A))  =  dim(C(A))  is  an  integer  number  that  char¬ 
acterises  A. 
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Definition  4.2.4  Given  a  matrix  A  e  Rm,n,  its  rank  is  the  number 

rk(A)  =  dim(C(A))  =  dim(R(A)) 

that  is  the  common  dimension  of  its  space  of  rows,  or  columns. 

Corollary  4.2.5  For  any  A  e  Rm,n  one  has  rk(A)  =  rk (TA). 

Proof  This  follows  from  Remark  4.2.2  since  C(A)  =  P(A).  □ 

It  is  clear  that  rk(A)  <  min (m,  n). 

Definition  4.2.6  A  matrix  A  e  Rm,n  has  maximal  rank  if  rk(A)  =  min (m,  n). 

Our  task  next  is  to  give  methods  to  compute  the  rank  of  a  given  matrix.  We  first 
identify  a  class  of  matrices  whose  rank  is  easy  to  determine. 

Remark  4. 2. 7  It  is  immediate  to  convince  one-self  that  the  rank  of  a  matrix  A  does  not 
change  by  enlarging  it  with  an  arbitrary  number  of  zero  rows  or  columns.  Moreover, 
if  a  matrix  B  is  obtained  from  a  matrix  A  by  a  permutation  of  either  its  rows  or 
columns,  that  is,  if  it  is 


(  Ri\ 

RaW  \ 

A  = 

r2 

and  B  = 

Rad) 

\Rm  J 

\  Ra(m)  J 

(where  a  denotes  a  permutation  of  m  objects)  or  if 

A  —  (C]_,  .  .  .  ,  Cyi)  and  B  —  (C, •  •  •  >  ^a'in)) 

(where  o'  denotes  a  permutation  of  n  objects),  then  rk(A)  =  rk(Z?)  =  vk(B').  These 
equalities  are  true  since  the  dimension  of  a  vector  space  does  not  depend  on  the 
ordering  of  its  basis. 

Definition  4.2.8  A  square  matrix  A  =  (atf)  e  is  called  diagonal  if  aij  =  0  for 

i  #  ./• 

Exercise  4.2.9  The  following  matrix  is  diagonal, 

/I  0  0  0  \ 

0  2  0  0 

A_  0  0  0  0  ' 

\0  0  0  -3/ 

Its  rows  and  columns  are  vectors  in  M4,  with  R\  =  e\,  R2  =  2^ 2 ,  R3  =  0,  R4  =  —3^4 
with  respect  to  the  canonical  basis.  As  a  consequence  R(A)  =  C(R\,  R2,  R3 ,  R4)  = 
C(eu  e2,  ef)  so  that  rk(A)  =  3. 
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The  rank  of  a  diagonal  matrix  of  order  n  coincides  with  the  number  of  its  non  zero 
diagonal  elements,  since,  as  the  previous  exercise  shows,  its  non  zero  rows  or  columns 
correspond  to  multiples  of  vectors  of  the  canonical  basis  of  M77.  Beside  the  diagonal 
ones,  a  larger  class  of  matrices  for  which  the  rank  is  easy  to  compute  is  given  in  the 
following  definition. 

Definition  4.2.10  Let  A  =  (aj  j)  be  a  square  matrix  in  M77’77.  The  matrix  A  is  called 
upper  triangular  if  atj  =  0  for  i  >  j .  An  upper  triangular  matrix  for  which  an  7^  0 
for  any  /,  is  called  a  complete  upper  triangular  matrix. 

Exercise  4.2.11  Given 


/I  0  3 
£=02  2 
\0  0  -1 


then  A  is  upper  triangular  and  B  is  complete  upper  triangular. 

Theorem  4.2.12  Let  A  e  W 7,77  be  a  complete  upper  triangular  matrix.  Then, 


rk(A)  =  n. 


Proof  Let 


A  = 


an  a12 
0  022 


•  •  a\n  \ 

•  •  ®2n 


\  0  0  •  ann  J 


To  prove  the  claim  we  show  that  the  n  columns  C\,  ...  ,Cn  of  A  are  linearly  inde¬ 
pendent.  The  equation  X\C\  +  •  •  •  +  XnCn  =0  can  be  written  in  the  form 


^  Ai^n  +  *  •  •  +  Xn-iain-i  +  Xna\n  ^ 

(°\ 

Xn— l&n— In— 1  T  Xnan—\n 
\  A nann  j 

0 

W 

Equating  term  by  term,  one  has  for  the  n- th  component  A nann  =  0,  which  gives 
\n  =  0  since  ann  7^  0.  For  the  ( n  —  l)-th  component,  one  has 


Xn—\an—\n—\  +  A  nan—\n  —  0 

which  gives,  from  \n  =  0  and  an- iiW_i  7^  0,  that  An_i  =0.  This  can  be  extended 
step  by  step  to  all  components,  thus  getting  \n  =  A„_i  =  •  •  •  =  Ai  =  0.  □ 

The  notion  of  upper  triangularity  can  be  extended  to  non  square  matrices. 
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Definition  4.2.13  A  matrix  A  =  (ciij )  e  is  called  upper  triangular  if  it  satisfies 
ciij  =  0  for  i  >  j  and  complete  upper  triangular  if  it  is  upper  triangular  with  an  ^  0 
for  any  i . 

Remark  4.2.14  Given  a  matrix  A  e  Wn,n  set  p  =  min (m,  n).  If  A  is  a  complete 
upper  triangular  matrix,  the  submatrix  B  made  by  the  first  p  rows  of  A  when  m  >  n, 
or  the  first  p  columns  of  A  when  m  <  n,  is  a  square  complete  upper  triangular  matrix 
of  order  p. 

Exercise  4.2.15  The  following  matrices  are  complete  upper  triangular: 


/ 1  °  — 3  \ 
02  0 
0  0-1 
\00  0  / 


The  submatrices 


1  0  -3 

£=102  0 
0  0-1 


/l  23  9  \ 
A'  =  0  2  0  7  . 

\0  0  4  —3/ 


1  2  3 
B'  =  I  0  2  0 
004 


are  (square)  complete  upper  triangular  as  mentioned  in  the  previous  remark. 

Corollary  4.2.16  If  A  e  is  a  complete  upper  triangular  matrix  then  rk(A)  = 
min (m,  n). 

Proof  We  consider  two  cases. 

•  n  >  m.  One  has  rk(A)  <  min (m,  n)  =  m,  with 


'  Cl\\  a  12  ^13  •  •  •  ^lm-l  a\m  *  •  •  •  *\ 

0  d22  ^23  •  •  •  &2m  —  1  a2m  *  ...  * 

0  0  ^33  .  .  .  a^m-l  m  *  ...  * 


y  o  o  o  . . .  o  omm  *  ...  *  J 


Let  B  be  the  submatrix  of  A  given  by  the  its  first  m  columns.  Since  B  is 
(Remark  4.2. 14)  a  complete  upper  triangular  square  matrix  of  order  m ,  the  columns 
Ci,  . . . ,  Cm  are  linearly  independent.  This  means  that  rk(A)  >  m  and  the  claim 
follows. 

•  n  <  m.  One  has  rk(A)  <  min(ra,  n)  =  n ,  with 
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'  Cl\\  Cl  12  CL  13  •  •  •  ain  \ 

0  U22  Cl23  •  •  •  a2n 

0  0  <233  . . .  a^n 

•  •  •  • 

A  •  •  •  • 

0  0  0  . . .  ann 

0  0  0  ...  0 

\  0  0  0  ...  0  / 

By  deleting  all  zero  rows,  one  gets  a  matrix  of  the  previous  type,  thus 
rk(A)  =  n.  □ 

The  matrices  A  and  A'  in  the  Exercise  4.2.15  are  both  complete  upper  triangular. 
Their  rank  is  3. 

Remark  4.2.17  The  notions  introduced  in  the  present  section  can  be  formulated  by 
considering  columns  instead  of  rows.  One  has: 

•  A  matrix  A  e  Wn,n  is  called  lower  triangular  if  atj  =  0  for  i  <  j .  A  lower  trian¬ 
gular  matrix  is  called  complete  if  an  7^  0  for  any  i. 

•  Given  A  e  Wn,n,  one  has  that  A  is  (complete)  upper  triangular  if  and  only  if  TA  is 
(complete)  lower  triangular. 

•  If  Ag  Mm,w  is  a  complete  lower  triangular  matrix  then  rk(A)  =  min (m,  n). 


4.3  Reduced  Matrices 

Definition  4.3.1  A  matrix  A  e  Wn,n  is  said  to  be  reduced  by  rows  if  any  non  zero 
row  has  a  non  zero  element  such  that  the  entries  below  it  are  all  zero.  Such  an  element, 
which  is  not  necessarily  unique  if  m  <  n,  is  called  the  pivot  of  its  row. 

Exercise  4.3.2  The  matrix 

/°  i  3  \ 

000 
20  0 
(00-1) 

is  reduced  by  row.  The  pivot  element  for  the  first  row  is  1 ,  the  pivot  element  for  the 
third  row  is  2,  the  pivot  element  for  the  fourth  row  is  —  1.  Note  that  rk(A)  =  3  since 
the  three  non  zero  rows  are  linearly  independent. 

Exercise  4.3.3  Any  complete  upper  triangular  matrix  is  reduced  by  rows. 

Theorem  4.3.4  The  rank  of  a  matrix  A  which  is  reduced  by  row  coincides  with  the 
number  of  its  non  zero  rows.  Indeed,  the  non  zero  rows  of  a  reduced  by  rows  matrix 
are  linearly  independent. 
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Proof  Let  A  be  a  reduced  by  rows  matrix  and  let  A'  be  the  submatrix  of  A  obtained 
by  deleting  the  zero  rows  of  A.  From  the  Remark  4.2.7,  rk(A')  =  rk(A).  Let  A " 
be  the  matrix  obtained  by  A!  by  the  following  permutation  of  its  columns:  the  first 
column  of  A"  is  the  column  of  A!  containing  the  pivot  element  for  the  first  row  of 
A ',  the  second  column  of  A"  is  the  column  of  A'  containing  the  pivot  element  for  the 
second  row  of  A!  and  so  on.  By  such  a  permutation  A"  turns  out  to  be  a  complete 
upper  triangular  matrix  and  again  from  the  Remark  4.2.7  it  is  rk( A')  =  rk (A") .  Since 
A"  is  complete  upper  triangular  its  rank  is  given  by  the  number  of  its  rows,  the  rank 
of  A  is  given  by  the  number  of  non  zero  rows  of  A.  □ 

Since  the  proof  of  such  a  theorem  is  constructive,  an  example  clarifies  it. 

Example  4.3.5  Let  us  consider  the  following  matrix  A  which  is  reduced  by  rows  (its 
pivot  elements  are  bold  typed): 

/  1  — 1  1  1  \ 

0  0  2  -1 

A_  2  0  0  0  ' 

\0  0  0  1  / 

The  first  column  of  A!  is  the  column  containing  the  pivot  element  for  the  first  row 
of  A,  the  second  column  of  A!  is  the  column  containing  the  pivot  element  for  the 
second  row  of  A  and  so  on.  The  matrix  A!  is  then 

(-111  1  \ 

,  _  0  2  0  -1 

A  ”  0020 

\  0  0  0  1  j 

and  A!  is  complete  upper  triangular;  so  rk(A)  =  rk(A')  =  4. 

Remark  4.3.6  As  we  noticed  in  Remark  4.2. 17,  the  notions  introduced  above  can  be 
formulated  by  exchanging  the  role  of  the  columns  with  that  of  the  rows  of  a  matrix. 

•  A  matrix  A  e  Wn,n  is  said  to  be  reduced  by  columns  if  any  non  zero  column  has 
a  non  zero  element  such  that  the  entries  at  its  right  are  all  zero.  Such  an  element, 
which  is  not  necessarily  unique,  is  called  the  pivot  of  its  column. 

•  If  A  is  a  reduced  by  columns  matrix  its  rank  coincides  with  the  number  of  its  non 
zero  columns.  The  non  zero  columns  are  linearly  independent. 

•  By  mimicking  the  proof  of  the  Theorem  4.3.4  it  is  clear  that  a  matrix  A  is  reduced 
by  rows  if  and  only  if  lA  is  reduced  by  column. 
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4.4  Reduction  of  Matrices 


In  the  previous  section  we  have  learnt  how  to  compute  the  rank  of  a  reduced  matrix. 
In  this  section  we  outline  a  procedure  that  associates  to  any  given  matrix  a  reduced 
matrix  having  the  same  rank. 

We  shall  consider  the  following  set  of  transformations  acting  on  the  rows  of  a  matrix. 
They  are  called  elementary  transformations  of  rows  and  their  action  preserves  the 
vector  space  structure  of  the  space  of  rows. 

•  (A)  The  transformation  Rt  i->  A that  replace  the  row  Ri  with  its  multiple  A Ri , 
with  R  3  A  7^  0, 

•  ( e )  The  transformation  Rt  Rj ,  that  exchanges  the  rows  Rj  and  Rj , 

•  ( D )  The  transformation  Rj  i->  Ri  +  aRj  that  replace  the  row  Ri  with  the  linear 
combination  Rt  +  aRj,  with  a  e  R  and  i  7^  j . 

Given  a  matrix  A  e  Rm,n  the  matrix  A!  e  Mm,n  is  said  to  be  row -transformed  from 
A  if  A '  is  obtained  from  A  by  the  action  of  a  finite  number  of  the  elementary  trans¬ 
formations  (A),  (e)  and  (. D )  listed  above. 

Proposition  4.4.1  Let  A  e  Mm,n  and  A!  e  Mm,n  row-transformed  form  A.  Then 

R(A)  =  R(A')  as  vector  spaces  and  rk(A)  =  rk(A/). 

Proof  It  is  obvious  that  for  an  elementary  transformation  (e)  or  (A)  the  vector  spaces 
R  (A)  and  R(A')  coincide.  Let  us  take  A ’  to  be  row-transformed  from  A  by  a  trans¬ 
formation  (. D ).  Since 


R(A)  =  C(R\, . . . ,  Ri- 1,  Ri,  Ri+i, . . . ,  Rm) 


and 


R(A')  =  C(R\,  . . . ,  Ri- 1,  /?,  +  /?*■+!,  . . . , 


it  is  clear  that  R (A7)  c  R (A).  To  prove  the  opposite  inclusion,  R (A)  c  R (A'),  it  is 
enough  to  show  that  the  row  Ri  in  A  is  in  the  linear  span  of  the  rows  of  A'.  Indeed 


Ri  =  ( Rj  +  aRj)  —  aRj ,  thus  the  claim. 


□ 


Exercise  4.4.2  Let 


A  = 


We  act  on  A  with  the  following  (D)  elementary  transformations: 
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A 


R2\-^R2~  2Ri 


A'  = 


A' 


R'3^R'3+R[ 


A 


// 


R''h>R»*R” 


The  matrix  A!"  is  reduced  by  rows  with  rk(Aw)  =  3.  From  the  proposition  above,  we 
conclude  that  rk(A)  =  rk(A//r)  =  3.  This  exercise  shows  how  the  so  called  Gauss’ 
algorithm  works. 

Proposition  4.4.3  Given  any  matrix  A  it  is  always  possible  to  find  a  finite  sequence 
of  type  ( D )  elementary  transformations  whose  action  results  in  a  matrix  (say  B) 
which  is  reduced  by  rows. 

Proof  Let  A  =  (a,ij)  e  Mm,n .  We  denote  by  the  first  non  zero  row  in  A  and  by  atj 
the  first  non  zero  element  in  Ri .  In  order  to  obtain  a  matrix  A'  such  that  the  elements 
under  a^j  are  zero  one  acts  with  the  following  (D)  transformation 

Rk  Rk  ~  akjaij~lRi ,  for  any  k  >  i. 

We  denote  such  a  transformed  matrix  by  A'  =  (<zF).  Notice  that  the  first  i  rows  in 
A'  coincide  with  the  first  i  rows  in  A,  with  all  the  elements  in  the  column  j  below 
the  element  a[j  =  a^  being  null.  Next,  let  R'h  be  the  first  non  zero  row  in  A!  with 
h  >  i  and  let  a'h  be  the  first  non  zero  element  in  R'h .  As  before  we  now  act  with  the 
following  (D)  elementary  transformation 

R'k  — >  R'k  -  a'kpa'hp~l  R'h ,  for  any  k  >  h. 

Let  A"  the  matrix  obtained  with  this  transformation  and  iterate.  It  is  clear  that  a  finite 
number  of  iterations  of  this  procedure  yield  a  matrix  B  which  is — by  construction — 
reduced  by  row.  □ 

With  the  expression  of  reduction  by  rows  of  a  matrix  A  we  mean  a  finite  sequence  of 
elementary  transformations  on  the  rows  of  A  whose  final  image  is  a  matrix  A!  which 
is  reduced  by  rows. 

Remark  4.4.4  The  proof  of  the  Proposition  4.4.3  made  use  only  of  type  ( D )  trans¬ 
formations.  It  is  clear  that,  depending  on  the  specific  elements  of  the  matrix  one  is 
considering,  it  can  be  easier  to  use  also  type  ( e )  and  (A)  transformations.  The  claim 
of  the  Proposition  4.4.1  does  not  change. 
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Exercise  4.4.5  Let  us  reduce  by  rows  the  following  matrix 

/0  1  0  0  \ 

0  12-1 

A~  000  9  ' 

\  1  3  1  5  / 

This  matrix  can  be  reduced  as  in  the  proof  of  the  Proposition  4.4.3  by  type  ( D ) 
transformations  alone.  A  look  at  it  shows  that  it  is  convenient  to  swap  the  first  row 
with  the  fourth.  We  have 


/ 1  3  1  5  \ 


000  9 

\0  1  0  0  / 

It  is  evident  that  the  matrix  B  is  already  reduced  by  row  so  we  can  write 
rk(A)  =  rk(Z?)  =  4. 

Exercise  4.4.6  Let  us  consider  the  matrix 

/2  1  -1  1 

A  =  [  3  1  1  -1 

\0  1  1  9 

To  reduce  A  we  start  with  the  type  ( D )  transformation  1 — ^  ^2  —  that  leads 

to 

/2  !  -!  !  \ 

A'  =  0  -1/2  5/2  -5/2  . 

\0  119/ 

Since  we  are  interested  in  computing  the  rank  of  the  matrix  A  in  order  to  avoid 
non  integers  matrix  entries  (which  would  give  heavier  computations)  we  can  instead 
reduce  by  rows  the  matrix  A '  as 

R2^2R2 

A '  - 


R ^  i — ^  R~i  Rr^ 

A"  - 


A"  = 


A 


R  [  0R4 


The  matrix  A!”  is  upper  triangular  so  we  have  rk(A)  =  3. 

The  method  of  reducing  by  rows  a  matrix  can  be  used  to  select  a  basis  for  a  vec¬ 
tor  space  V  given  as  a  linear  span  of  a  system  of  vectors  in  some  Mw,  that  is 
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V  =  C(v\,  . . . ,  vr).  To  this  end,  given  the  vectors  ...  ,vr  spanning  V ,  one  con¬ 
siders  the  matrix  A  with  rows  v\,  . . . ,  vr  or  alternatively  a  matrix  B  with  columns 


V\,  . . . ,  vf 


(  V\  ^ 

\Vr) 


B  =  Oi  •  •  •  Vr)  . 


One  then  has  R (A)  =  V  using  A,  which  is  reduced  by  rows  to  a  matrix 


\wr  / 


Clearly  V  =  R(A)  =  7?(A0  and  dim(V)  =  dim(7?(A))  =  rk(A)  =  rk(A;).  That  is 
dim(V)  is  the  number  of  non  zero  rows  in  A!  and  these  non  zero  rows  in  A '  are  a 
basis  for  V. 

Exercise  4.4.7  In  M4  consider  the  system  of  vectors  I  =  {v\,  i>2,  v3, 14, 1^5 }  with 
m  =  (1,  -1,  2,  1),  n2  =  (-2,  2,  -4,  —2),  v3  =  (1,  1,  1,  -1),  n4  =  (-1,  3,  -3,  -3), 
r>5  =  (1,  2,  1,  2).  We  would  like  to 

(a)  exhibit  a  basis  B  for  V  =  C(I)  C  M4,  with  Sc/, 

(b)  complete  S  to  a  basis  C  for  M4. 

For  point  (a)  we  let  A  be  the  matrix  whose  rows  are  the  vectors  in  /  that  is, 


/  i  — i  2  1 

V2 

-2  2  -4  -2 

V3 

— 

111-1 

v4 

-1  3  -3  -3 

\V5) 

\  1  2  1  2  / 

We  reduce  the  matrix  A  by  rows  using  the  following  transformations: 


R2 i — ^  R2~\~'2.Ri 
R^\-^  Rt,—Ri 


A 


R^\ — ^  Ri\-\-  R\ 
Rs^Rs^Ri 


A '  = 


/I  -1  2  l  \ 
0  0  0  0 
02-1-2 
02-1-2 
\0  3  -1  1  ) 


A 


! 


R'4^R'4-R'3 

- > 


/  1  — 1  2  l  \ 

0  0  0  0 

02-1-2 
0  0  0  0 

\0  0  1  8  / 


R'5t^2R'5-3R'3 
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As  a  result  we  have  rk(A)  =  3  and  then  dim(V)  =  3.  A  basis  for  V  is  for  example 
given  by  the  three  non  zero  rows  in  A "  since  R (A)  =  7? (A").  The  basis  S  is  made  by 
the  vectors  in  I  corresponding  to  the  three  non  zero  rows  in  A"  that  is  B  =  (v\ ,  v3 ,  V5) . 
Cleary,  with  the  transformations  given  above  one  has  also  that 


i-^ 


B'  = 


To  complete  the  basis  B  to  a  basis  for  M4  one  can  use  the  vectors  of  the  canonical 
basis.  From  the  form  of  the  matrix  B'  it  is  clear  that  it  suffices  to  add  the  vector  e 4  to 
the  three  row  vectors  in  B  to  meet  the  requirement: 


/  1  — 1  2  1  \ 

V3 

02-1-2 

V5 

0  0  18 

w 

\o  0  0  1  / 

We  can  conclude  that  C  =  (v\,  v3,  V5,  e4). 

Exercise  4.4.8  Let  I  =  {v  1,  v2,  v3,  v4]  c  M4  be  given  by  v\  =  (0,  1,  2,  1), 
v2  =  (0,  1,  1,  1),  v3  =  (0,  2,  3,  2),  v4  =  (1,  2,  2,  1).  With  V  =  £(/)  c  M4: 

(a)  determine  a  basis  B  for  V ,  with  Sc/, 

(b)  complete  S  to  a  basis  C  for  M4. 

Let  A  be  the  matrix  whose  rows  are  given  by  the  vectors  in  I  that  is, 


(  \ 

/0  1  2  1\ 

v2 

0  111 

1)3 

0  2  3  2 

\  1 2  2 1  / 

After  swapping  R±  R4,  the  matrix  can  be  reduced  following  the  lines  above, 
leading  to 

/  1  2  2  1  \ 

0  111 
0  2  3  2 
\0 121/ 

/  1  2  2  1  \ 

0  111 
>  0010 


Rt,  1-^  R^—2R2 


R4  \— >•  R4  —  R2 


{I22l\ 
0  111 
0  0  10 
\0  0  1  0/ 
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We  can  then  take  S  =  (u4,  u2, i^).  Analogously  to  what  we  did  in  the  Exercise  4.4.7, 
we  have 


/  1  2  2  1  \ 

v2 

I _ 

0  111 

V3 

1 - 7* 

0  0  10 

\e4J 

\ooo  1/ 

and  such  a  matrix  shows  that  we  can  tale  C  =  (14,  i>2,  V3,  e4)  as  a  basis  for  M4. 

Exercise  4.4.9  Consider  again  the  set  /  given  in  the  previous  exercise.  We  now  look 
for  a  basis  Sc/  via  the  constructive  proof  of  the  Theorem  2.4.2.  The  reduction  by 
rows  procedure  can  be  used  in  this  case  as  well.  Start  again  with 


rvi  \ 

/0  1  2  1  \ 

v2 

0  111 

v3 

0  2  3  2 

\  1 2  2 1  / 

The  swap  operated  in  Exercise  4.4.8  is  not  admissible  with  the  procedure  in  the 
Theorem  2.4.2  so  we  use  type  ( D )  transformations: 


R2 1— >■  R2—R1 

A  - > 


/?3  /?3  —  2R2 

R4  —  R/\  —  R\ 


/o  1  2  i\ 
00-10 
00-10 
\i  1  0  0/ 


> 


R'3^R'3-  r'2 


/0  1  2  1\ 
00-10 
00  0  0 
\11  0  0/ 


These  computations  show  that  R3  —  R2  =  0,  R3  =  R3  —  2Ri  and  R2  =  R2  —  R\. 
From  these  relations  we  have  that  R3  —  R2  —  R\  =0  which  is  equivalent  to 
v3  =  v\  +  n2:  this  shows  that  V3  is  a  linear  combination  of  v\  and  so  we  recover 
the  set  {v\,  V4}  as  a  basis  for  £(/). 

The  method  we  just  illustrated  in  order  to  exhibit  the  basis  of  a  vector  subspace  of 
W1  can  be  used  with  any  vector  space:  the  entries  of  the  relevant  matrix  will  be  given 
by  the  components  of  a  system  of  vectors  with  respect  to  a  fixed  basis. 

Exercise  4.4.10  Let  V  =  C(I)  C  M2,3  with  I  =  {Mi,  M2,  M3,  M4}  given  by 

Ms 

(a)  exhibit  a  basis  B  for  V ,  with  Sc/, 

(b)  complete  S  to  a  basis  C  for  M2,3. 
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In  order  to  use  the  reduction  method  we  need  to  represent  the  matrices  M\ ,  M2 ,  M3,  M4 
as  row  vectors.  The  components  of  these  vectors  will  be  given  by  the  components 
of  the  matrices  in  a  basis.  This  we  may  take  to  be  the  basis  £  =  ( Etj  \  i  =  1,  2; 
j  =  1,  2,  3)  of  M2,3  made  of  elementary  matrices  as  introduced  in  the  proof  of  the 
Proposition  4.1.4.  One  has,  for  example, 


M\  =  E\\  +  E 12  +  E 13  +  E22  =  (1,  1,  1,  0,  1,  0)s. 


Proceeding  analogously  we  write  the  matrix 


(mC 

/ 1  1  1  0  1  0  \ 

m2 

12101  1 

m3 

23202  1 

\maj 

\0  110  1—1/ 

With  a  suitable  reduction  we  have 


/ 1  1  1  0  1 

0  \ 

/ 1  1  1  0  1  0\ 

1-^ 

0  1000 

1 

010001 

0  1000 

1 

000000 

\0  110  1—1/  \0  2  1  0  1  0/ 


from  which  we  have  B  =  (Mi,  M2 ,  M4). 

We  complete  S  to  a  basis  C  for  R2-3  by  considering  3  elements  in  5  and  the  same 
reduction: 


(  Mx\ 

/ 1 

1 

1 

0 

1 

°\ 

m2 

0 

1 

0 

0 

0 

1 

m4 

0 

2 

1 

0 

1 

0 

E\3 

0 

0 

1 

0 

0 

0 

E21 

0 

0 

0 

1 

0 

0 

\e22) 

\0 

0 

0 

0 

1 

0/ 

Since  this  matrix  is  reduced  by  row,  the  vectors  {Mi,  M2,  M4,  £13,  E21,  E22}  are  6 
linearly  independent  vectors  in  R2'3  (whose  dimension  is  6).  This  is  enough  to  say 
that  they  give  a  basis  C  for  R2'3  completing  B. 


4.5  The  Trace  of  a  Matrix 

We  end  this  chapter  with  another  useful  notion  for  square  matrices. 

Definition  4.5.1  The  trace  of  a  square  matrix  is  the  function  tr  :  M.n,n  — >  R  defined 
as  follows.  If  A  =  ( aij )  its  trace  is  given  by 
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n 

tr(A)  =  a\\  +  (222  +  •  •  •  +  C lnn  =  ^  '  ajj  • 

7  = 1 

That  is,  the  trace  of  a  matrix  is  the  sum  of  its  diagonal  elements. 

The  following  proposition  proves  an  important  property  of  the  trace  function  for  a 
matrix. 

Proposition  4.5.2  With  A  =  (aij)  and  B  =  ( btj )  e  Whn  it  holds  that 


tr  (AB)  =  tr  (BA). 

Proof  The  entry  (/,  j)  in  A B  is  ( AB)tj  =  Jfk= t  aikbkj,  while  the  entry  (/,  j)  in  BA 
is  ( BA)ij  =  l  bik^kj'  From  the  row  by  column  product  of  square  matrices  one 
obtaines 


n  n  n 

tr  (AB)  =  (AB)jj  =  ajkbkj 

7  =  1  7  =  1  k=\ 

n  n 

=  bkjCLjk 

k=  1  7  =  1 

n 

=  (BA)kk  =  tr  (BA), 

£=l 


which  is  the  claim. 


□ 


Because  of  the  above  property  one  says  that  the  trace  is  cyclic. 
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The  notion  of  determinant  of  a  matrix  plays  an  important  role  in  linear  algebra. 
While  the  rank  measures  the  linear  independence  of  the  row  (or  column)  vectors  of  a 
matrix,  the  determinant  (which  is  defined  only  for  square  matrices)  is  used  to  control 
the  invertibility  of  a  matrix  and  in  explicitly  constructing  the  inverse  of  an  invertible 
matrix. 


5.1  A  Multilinear  Alternating  Mapping 

The  determinant  can  be  defined  as  an  abstract  function  by  using  multilinear  algebra. 
We  shall  define  it  constructively  and  using  a  recursive  procedure. 

Definition  5.1.1  The  determinant  of  a  2  x  2  matrix  is  the  map 

det  :  M2,2  ->  M,  A  i->  det(A)  =  \A\ 


defined  as 

A  =  (anai2)  i — >  det(A)  = 

\a2 1  a22  ) 

The  above  definition  shows  that  the  determinant  can  be  though  of  as  a  function 
of  the  column  vectors  of  A  =  (Ci,  Cf),  that  is 

det  :  M2  x  M2  ->  M,  (C 1,  Cf)  i->  011022  —  012021- 

It  is  immediate  to  see  that  the  map  det  is  bilinear  on  the  column  of  A,  that  is 
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det(ACi  +  A'Cj,  C2)  =  Adet(Ci,  C2)  +  A'det(C;,  C2) 

det(Ci,  AC2  +  A fC'2)  =  AdetCCi,  C2)  +  A'det(Ci,  C^)  (5.1) 

for  any  C\,  C[,  C2,  C'2  e  M1 2  and  any  A,  A7  e  R. 

The  map  det  is  indeed  alternating  (or  skew- symmetric),  that  is 

det(C2,  Ci)  =  -det(Ci,C2).  (5.2) 

From  (5.2)  the  determinant  of  A  vanishes  if  the  columns  C\  and  C2  coincide. 
More  generally,  det(A)  =  0  if  C2  =  ACi  for  A  e  R,  since,  from  (5.1) 

det(C1?  C2)  =  det(Ci,  ACi)  =  Adet(Ci,  Cx)  =  0. 


Since  the  determinant  map  is  bilinear  and  alternating,  one  also  has 

det(Ci  +  AC2,  C2)  =  det(Ci,  C2)  +  det(AC2,  C2)  =  det (Cu  C2). 
Exercise  5.1.2  Given  the  canonical  basis  (e\ ,  e2)  for  M2,  we  compute 


det(^i,  e\)  = 


1  1 
0  0 


=  0,  det(ei,£?2)  = 


det (e2,  ei)  = 


0  1 
1  0 


= -1,  det(e2»  £2)  = 


1  0 
0  1 

0  0 
1  1 


=  1 


=  0. 


We  generalise  the  definition  of  determinant  to  3  x  3  and  further  to  n  x  n  matrices. 

Definition  5.1.3  Given  a  3  x  3  matrix 


(an  cin  #13 
021  022  0 23 
031  032  033 


one  defines  det  :  M3,3  — >  R  as 


det  (A)  =  |A|  = 


<311  <312  <313 
<321  <322  <323 
<331  <332  <333 


=  <311 


<322  <323 
<332  <333 


-  <312 


<321  <323 
<331  <333 


+  <313 


<321  <322 
<331  <332 


(5.3) 


=  <311<322<333  —  <311<323<332  ~  <312<321<333  +  <312<323<331  +  <313<321<332  “  <313<322<331  • 


Exercise  5.1.4  Let  us  compute  the  determinant  of  the  following  matrix, 


A  = 


1  0  -1 
1  1  -1 

2  1  0 
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Using  the  first  row  as  above  one  gets: 


det(A)  = 


1  -1 

1  1 

1  0 

2  1 

It  is  evident  that  the  map  det  can  be  read,  as  we  showed  above,  as  defined  on  the 
column  vectors  of  A  =  (Ci,  C2,  C3),  that  is 

det  :13xI3xR3^R,  (Ci,  C2,  C3)  i->  det(A). 


Remark  5.1.5  It  is  easy  to  see  that  the  map  det  defined  in  (5.3)  is  multilinear,  that 
is  it  is  linear  in  each  column  argument.  Also,  for  any  swap  of  the  columns  of  A, 
det(A)  changes  its  sign.  This  means  that  (5.3)  is  an  alternating  map  (this  property 
generalises  the  skew-symmetry  of  the  det  map  on  2  x  2  matrices).  For  example, 


det(C2,C1,C3)  =  -det(C1,C2,C3), 


with  analogous  relations  holding  for  any  swap  of  the  columns  of  A.  Then  det  (A)  =  0 
if  one  of  the  columns  of  A  is  a  multiple  of  the  others,  like  in 

det  (Ci,  C2,  AC2)  =  A  det(Cl5  C2,  C2)  =  -A  det(C1#  C2,  C2)  =  0. 

More  generally  det  (A)  =  0  if  one  of  the  columns  of  A  is  a  linear  combination  of  the 
others  as  in 


det(AC2  +  /iC3,  C2,  C3)  =  A  det(C2,  C2,  C3)  +  /1  det(C3,  C2,  C3)  =  0. 


Exercise  5.1.6  If  (ei,e2,e2)  is  the  canonical  basis  for  M3,  generalising  Exer¬ 
cise  5.1.2  one  finds  det (e;,  et)  =  0,  det (e*,  et,  ej)  =0  and  det(^i,  e2,  e^)  = 
det(/3)  =  1,  with  /3  the  3x3  unit  matrix. 

We  have  seen  that  the  determinant  of  a  3  x  3  matrix  A  makes  use  of  the  deter¬ 
minant  of  a  2  x  2  matrix:  such  a  determinant  is  given  as  the  alternating  sum  of  the 
elements  in  the  first  row  of  A,  times  the  determinant  of  suitable  2x2  submatrices 
in  A.  This  procedure  is  generalised  to  define  the  determinant  of  n  x  n  matrices. 

Definition  5.1.7  Consider  the  matrix  A  =  (0^)  e  Mw,w,  or 


^011  ^12  •  •  •  aln  \ 
021  022  •  •  •  02 n 


\0«1  0«2  •  •  •  0«n  / 


For  any  pair  (/,  j)  we  denote  by  Atj  the  (n  —  1)  x  (n  —  1)  submatrix  of  A  obtained 
by  erasing  the  i- th  row  and  the  j- th  column  of  A,  Firstly,  the  number  det(A/;)  is 
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called  the  minor  of  the  element  aij.  Then  the  cofactor  atj  of  the  element  aij  (or 
associated  with  )  is  defined  as 


ay  =  (— l)i+;  det(A,-,) 


IJ 


Exercise  5.1.8  With  A  e  M3,3  given  by 


we  easily  compute  for  instance, 


An  = 


-2  -1 
5  0 


A 12  = 


3  -1 
2  0 


and 


an  =  ( —  1) 1+1 1 A ! !  |  =  5,  a12  =  (-1)1+Z|A12|  =  -2 


1+2 


Definition  5.1.9  Let  A  =  (an)  e  R”,w.  One  defines  its  determinant  by  the  formula 


det(A)  —  fliian  +  <212^12  +  •  •  •  +  d\nQL\n.  (5.4) 


Such  an  expression  is  also  referred  to  as  the  expansion  of  the  determinant  of  the 
matrix  A  with  respect  to  its  first  row. 

The  above  definition  is  recursive:  the  determinant  of  a  n  x  n  matrix  involves 
the  determinants  of  a  (n  —  1)  x  (n  —  1)  matrices,  starting  from  the  definition  of  the 
determinant  of  a  2  x  2  matrix.  The  Definition  5.1.3  is  indeed  the  expansion  with 
respect  to  the  first  row  as  written  in  (5.4). 

That  the  determinant  det(A)  of  a  matrix  A  can  be  equivalently  defined  in  terms 
of  a  similar  expansion  with  respect  to  any  row  or  column  of  A  is  the  content  of  the 
following  important  theorem,  whose  proof  we  omit. 

Theorem  5.1.10  (Laplace)  For  any  i  =  2,  . . . ,  n  it  holds  that 


det(A)  —  anan  +  0^0^12  +  •  •  •  +  (5.5) 

This  expression  is  called  the  expansion  of  the  determinant  of  A  with  respect  to  its 
i  -th  row. 

For  any  j  =  1 ,  ...  ,n,  it  holds  that 


det(A)  —  a\jOL\j  +  d2j^2j  +  •  •  •  +  dn  j  ctn  j  (5.6) 

and  this  expression  is  the  expansion  of  the  determinant  of  A  with  respect  to  its  j-th 
column. 
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The  expansions  (5.5)  or  (5.6)  are  called  the  cofactor  expansion  of  the  determinant 
with  respect  to  the  corresponding  row  or  column. 

Exercise  5.1.11  Let  In  e  M77’77  be  the  n  x  n  unit  matrix.  It  is  immediate  to  compute 

det(4)  =  1. 


From  the  Laplace  theorem  the  following  statement  is  obvious. 

Corollary  5.1.12  Let  A  e  Rn,n.  Then  det(7A)  =  det  (A). 

Also,  from  the  Laplace  theorem  it  is  immediate  to  see  that  det(A)  =  0  if  A 
has  a  null  column  or  a  null  row.  We  can  still  think  of  the  determinant  of  the 
matrix  A  as  a  function  defined  on  its  columns.  If  A  =  (C i,  •  •  •  ,Cn),  one  has 
det(A)  =  det(Ci, . . . ,  Cn),  that  is 

det  :  M77  x  •  •  •  x  M77  M,  (Ci, . . . ,  Cn)  i->  det(A). 

The  following  result,  that  can  be  proven  by  using  the  Definition  5.1.9,  generalises 
properties  already  seen  for  the  matrices  of  order  two  and  three. 

Proposition  5.1.13  Let  A  —  (C*i,  •  •  •  ,  Cn)  £  M77’77.  One  has  the  following  proper¬ 
ties: 

(i)  For  any  A,  A2  £  R  and  C[  £  M77,  it  holds  that 

det  (ACi  +  A7C;,  C2,  . . . ,  Cn)  =  Adet(Ci,  C2, . . . ,  Cn)  +  A7det(C;,  C2, . . . ,  C„). 

Analogous  properties  hold  for  any  other  column  of  A. 

(ii)  IfAf  =  (Ca(  i),  . . . ,  C(j  (n))>  where  a  =  (cr(l),  . . . ,  cr(n))  is  a  permutation  of  the 
columns  transforming  A  i-^  A',  it  holds  that 

det(A')  =  (— l)77  det(A), 

where  (—  l)77  is  the  parity  of  the  permutation  a,  that  is  (—  l)a  =  1  if  cr  is  given 
by  an  even  number  of  swaps,  while  (—  l)77  =  —  1  if  cr  is  given  by  an  odd  number 
of  swaps. 

Corollary  5.1.14  Let  A  =  (Ci,  •  •  •  ,  Cn)  e  Wl,n.  Then, 

(i)  det(ACi,  C2,  . . . ,  Cn)  =  Adet(A), 

(ii)  if  Ci  =  Cj  for  any  pair  i,  j,  then  det(A)  =  0, 

(iii)  det(a2C2  +  •  •  •  +  cxnCn,  C2,  . . . ,  Cn)  =  0;  that  is  the  determinant  of  a  matrix 
A  is  zero  if  a  column  of  A  is  a  linear  combination  of  its  other  columns, 

(iv)  det(Ci  +  ol2C2  +  •  •  •  +  cvnCn,  C2, . . . ,  Cn)  =  det(A). 
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Proof  (i)  it  follows  from  the  Proposition  5.1.13,  with  A7  =  0, 

(ii)  if  Ci  =  Cj ,  the  odd  permutation  a  which  swaps  C,  with  Cj  does  not  change  the 
matrix  A;  then  from  the  Proposition  5. 1.13,  det( A)  =  —  det(A)  =>  det(A)  =  0, 

(iii)  from  5 . 1 . 1 3  we  can  write 


n 

det(ct2C2  +  •  •  •  +  OinCn,  C*2?  ■  ■  • )  Gf)  —  ^  ^ otj  det (C),  L2,  •  •  •  ?  Cn)  —  0 

i—2 

since,  by  point  (ii),  one  has  det(C/,  C2,  . . . ,  Cn)  =  0  for  any  i  =  2,  ...  ,n, 

(iv)  from  the  previous  point  we  have 


det(Ci  +  OL2C2  +  •  •  •  +  Oin  Cn  •>  C2  5  •  •  •  ?  C«) 

n 

=  det(Ci,  C2,  •  •  ■ ,  Cn)  +  det(C,  ,  C2, . . . ,  C„)  =  det(A). 

i=2 


This  concludes  the  proof.  □ 

Remark  5.1.15  From  the  Laplace  theorem  it  follows  that  the  determinant  of  A  is  an 
alternating  and  multilinear  function  even  when  it  is  defined  via  the  expansion  with 
respect  to  the  rows  of  A. 

We  conclude  this  section  with  the  next  useful  theorem,  whose  proof  we  omit. 
Theorem  5.1.16  (Binet)  Given  A,  B  e  Rn,n  it  holds  that 

det  (AB)  =  det  (A)  det(fl).  (5.7) 


5.2  Computing  Determinants  via  a  Reduction  Procedure 

The  Definition  5.1.9  and  the  Laplace  theorem  allow  one  to  compute  the  determinant  of 
any  square  matrix.  In  this  section  we  illustate  how  the  reduction  procedure  studied 
in  the  previous  chapter  can  be  used  when  computing  a  determinant.  We  start  by 
considering  upper  triangular  matrices. 

Proposition  5.2.1  Let  A  =  (aij)  e  Wt,n .  If  A  is  diagonal  then, 


det  (A)  =  a\\a22  •  •  •  &nn  • 


More  generally,  if  A  is  an  upper  ( respectively  a  lower)  triangular  matrix, 
det(A)  =  ana22  ■  ■  -ann. 


5.2  Computing  Determinants  via  a  Reduction  Procedure 
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Proof  The  claim  for  a  diagonal  matrix  is  evident.  With  A  an  upper  (respectively 
a  lower)  triangular  matrix,  by  expanding  det(A)  with  respect  to  the  first  column 
(respectively  row)  the  submatrix  An  is  upper  (respectively  lower)  triangular.  The 
result  then  follows  by  a  recursive  argument.  □ 

Remark  5.2.2  In  Sect.  4.4  we  defined  the  type  (s),  (A)  and  ( D )  elementary  transfor¬ 
mations  on  the  rows  of  a  matrix.  If  A  is  a  square  matrix,  transformed  under  one  of 
these  transformations  into  the  matrix  A\  we  have  the  following  results: 

•  (s)  :  det(A')  =  —  det(A)  (Proposition  5.1.13), 

•  (A)  :  det(A')  =  Adet(A)  (Corollary  5.1.14), 

•  ( D )  :  det(A')  =  det(A)  (Corollary  5.1.14). 

It  is  evident  that  the  above  relations  are  valid  when  A  is  mapped  into  A'  with  ele¬ 
mentary  transformations  on  its  columns. 

Exercise  5.2.3  Let  us  use  row  transformations  on  the  matrix  A: 


1  ^  2R\ 


/?3  /?3  —  R\ 


->  A' 


1  1  -1 
A'  =  I  0  -1  3 


■»  A 


// 


0  1  2  /  r'3\-+  r'3  +  r'2 

1  1  -1 


A  =  10-1  3 
0  0  5 


Since  we  have  used  only  type  ( D )  transformations,  from  the  Remark  5.2.2 
det(A)  =  det(A")  and  from  Proposition  5.2.1  wehavedet(A/r)  =  1  •  (— 1)  •  5  =  —5. 

Exercise  5.2.4  Via  a  sequence  of  elementary  transformations, 


Ci*>C2 


A  = 


A' 


R'2\-+  R'2-  2 R[ 


A 


rr 


R'^R'3 


r: 


R'i  ^  R3  -  R2 


A 


m 
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Since  we  used  once  a  type  (s)  transformation  det(A)  =  —  det(A;//)  =  —  3. 

Remark  5.2.5  The  sequence  of  transformations  defined  in  the  Exercise  5.2.3  does 
not  alter  the  space  of  rows  of  the  matrix  A,  that  is  R (A)  =  7? (A").  The  sequence  of 
transformations  defined  in  the  Exercise  5.2.4  does  alter  both  the  spaces  of  rows  and 
of  columns  of  the  matrix  A. 

Proposition  5.2.6  Let  A  =  (atj )  e  be  reduced  by  rows  and  without  null  rows. 
It  holds  that 

det(A)  —  (  1)  tZl,<j(l)  •  •  •  CLn,a(n) 

where  at^^  is  the  pivot  element  of  the  i-th  row  and  a  is  the  permutation  of  the 
columns  mapping  A  into  the  corresponding  ( complete )  upper  triangular  matrix. 

Proof  Let  B  =  (bij )  e  R",n  be  the  complete  upper  triangular  matrix  obtained 
from  A  with  the  permutation  a.  From  the  Proposition  5.E13  we  have  det(A)  = 
(—  \)a  det(Z?),  with  (—  \)a  the  parity  of  a.  From  the  Proposition  5.2.1  we  have 
det  (B)  =  bub^i  •  •  •  bnn,  with  bn  =  ...  ,bnn  =  an^n)  by  construction,  thus 

obtaining  the  claim.  □ 

The  above  proposition  suggests  that  a  sequence  of  type  ( D )  transformations  on  the 
rows  of  a  square  matrix  simplifies  the  computation  of  its  determinant.  We  summarise 
this  suggestion  as  a  remark. 

Remark  5.2. 7  In  order  to  compute  the  determinant  of  the  matrix  A  e  : 

•  riduce  A  by  row  with  only  type  ( D )  transformations  to  a  matrix  A';  this  is 
always  possible  from  the  Proposition  4.4.3.  Then  det(A)  =  det(A;)  from  the 
Remark  5.2.2; 

•  compute  the  determinant  of  A'.  Then, 

-  if  A'  has  a  null  row,  from  the  Corollary  5.1.14  one  has  det(A')  =  0; 

-  if  A'  has  no  null  rows,  from  the  Proposition  5.2.6  one  has 

det(A')  =  (-l)Vlj<T(1).-.<<7(„) 
with  cf  =  (cr(l),  . . . ,  a(n )). 

Again,  the  result  continues  to  hold  by  exchanging  rows  with  columns. 

Exercise  5.2.8  With  the  above  method  we  have  the  following  equalities, 


1 

2 

1 

-1 

1  2 

1 

-1 

0 

1 

1 

1 

0  1 

1 

1 

-1 

-1 

1 

1 

0  1 

2 

0 

1 

2 

0 

1 

0  0 

-1 

2 

1  2 

1 

-1 

12  1-1 

0  1 

1 

1 

0  111 

0  0 

1 

-1 

0  0  1-1 

0  0 

-1 

2 

0  0  0  1 

=  1. 
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5.3  Invertible  Matrices 

We  now  illustrate  some  use  of  the  determinant  in  the  study  of  invertible  matrices. 
Proposition  5.3.1  Given  A  e  Rn,n,  it  holds  that 

det(A)  =  0  rk(A)  <  n. 


Proof 

(<^=)  ;  By  hypothesis,  the  system  of  the  n  columns  C\, ...  ,Cn  of  A  is  not  free,  so 
there  is  least  a  column  of  A  which  is  a  linear  combination  of  the  other  columns. 
From  the  Corollary  5.1.14  it  is  then  det(A)  =  0. 

(=>>)  :  Suppose  rk(A)  =  n.  With  this  assumption  A  could  be  reduced  by  row  to  a 
matrix  A'  having  no  null  rows  since  rk(A)  =  rk(A7)  =  n.  From  the  Propo¬ 
sition  5.2.6,  det(A7)  is  the  product  of  the  pivot  elements  in  A'  and  since  by 
hypothesis  they  would  be  non  zero,  we  would  have  det(A7)  7^  0  and  from  the 
Remark  5.2.2  det(A)  =  det(A7)  7^  0  thus  contradicting  the  hypothesis.  □ 

Remark  5.3.2  The  equivalence  in  the  above  proposition  can  be  stated  as 

det(A)  7^  0  rk(A)  =  n. 


Proposition  5.3.3  A  matrix  A  =  (aij)  e  M72,n  is  invertible  (or  non-singular)  if  and 
only  if 

det(A)  7^  0. 


Proof  If  A  is  invertible,  the  matrix  inverse  A  1  exists  with  A  A  1  =  In.  From  the 
Binet  theorem,  this  yields  det(A)  det(A_1)  =  det(/n)  =  1  or  det(A_1)  = 
(det(A))-1  /  0. 

If  det(A)  7^  0,  the  inverse  of  A  is  the  matrix  B  =  (bij)  with  elements 


1 

det(A) 


and  a ji  the  cofactor  of  ajt  as  in  the  Definition  5.1.7.  Indeed,  an  explicit  computation 
shows  that 


n 

( AB)rs  —  ^  '  &rkbks 

k=  1 


1 

det(A) 


n 


^  ^  ^rk^sk 


k= 1 


det(A) 
.  det(A) 

0 


if  r  =  s 
if  r  7^  s 


The  result  for  r  =  s  is  just  the  cofactor  expansion  of  the  determinant  given  by  the 
Laplace  theorem  in  Theorem  5. 1 .10,  while  the  result  for  r  ^  s  is  known  as  the  second 
Laplace  theorem  (whose  discussion  we  omit).  The  above  amounts  to  AB  =  In  so 
that  A  is  invertible  with  B  =  A-1 .  □ 
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Notice  that  in  the  inverse  matrix  B  there  is  an  index  transposition,  that  is  up  to 
the  determinant  factor,  the  element  bij  of  B  is  given  by  the  cofactor  aji  of  A. 

Exercise  5.3.4  Let  us  compute  the  inverse  of  the  matrix 


This  is  possible  if  and  only  if  |  A\  =  ad  —  be  7^  0.  In  such  a  case, 

^-1  _  f  £*11  £*21 

|  A I  \£*12  £*22 

with  an  =  d,  a2i  =  —b,  a  12  =  —c,  a22  =  a,  so  that  we  get  the  final  result, 

A-i  =  _L  (  d  ~b\ 

\A\  \~c  a  )■ 

Exercise  5.3.5  Let  us  compute  the  inverse  of  the  matrix  A  from  the  Exercise  5.1.4, 

/ 1  0  — 1 
A  =  11-1 

\2  1  0 


From  the  computation  there  det(A)  =  2,  explicit  computations  show  that 


an  =  (+)  1 
a2i  =  (— )  1 
£*31  =  (+)  1 


an  =  (— )  2 
£*22  =  (+)  2 
£*32  =  (  — )  0 


£*13  =  (+)  (-1) 

£*23  =  (— )  1 
£*33  =  (+)  1 


It  is  then  easy  to  find  that 


A 


1 


1 

2 


1  -1  1 
-2  2  0 
-1  -1  1 


Chapter  6 

Systems  of  Linear  Equations 


® 

Check  for 
updates 


Linear  equations  and  system  of  them  are  ubiquitous  and  an  important  tool  in  all 
of  physics.  In  this  chapter  we  shall  present  a  systematic  approach  to  them  and  to 
methods  for  their  solutions. 


6.1  Basic  Notions 


Definition  6.1.1  An  equation  in  n  unknown  variables  x\,  ...  ,xn  with  coefficients 
in  R  is  called  linear  if  it  has  the  form 


a\X\  +  •  •  •  +  anxn  —  b , 

with  at  e  R  and  b  e  R.  A  solution  for  such  a  linear  equation  is  an  n -tuple  of  real 
numbers  (on, . . . ,  an)  e  R”  which,  when  substituted  for  the  unknowns,  yield  an 
‘identity’,  that  is 

a\oi\  - h  •  •  •  H-  anan  —  b. 

Exercise  6.1.2  It  is  easy  to  see  that  the  element  (2,  6,  1)  e  M3  is  a  solution  for  the 
equation  with  real  coefficients  given  by 


3x\  —  2x2  +  7*3  =  1. 


Clearly,  this  is  not  the  only  solution  for  the  equation:  the  element  (|,  0,  0)  is  for 
instance  a  solution  of  the  same  equation. 

Definition  6.1.3  A  collection  of  m  linear  equations  in  the  n  unknown  variables 
x\,  ...  ,xn  and  with  real  coefficients  is  called  a  linear  system  of  m  equations  in  n 

©  Springer  International  Publishing  AG,  part  of  Springer  Nature  2018 
G.  Landi  and  A.  Zampini,  Linear  Algebra  and  Analytic  Geometry 
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unknowns.  We  shall  adopt  the  following  notation 


011*1  +  012*2  + 
021*1  H"  022*2  "T 


+  01  77*72 

T"  02 77*72 


b\ 

b2 


0ml*l  "h  0m2*2  “I-  •  •  •  T  0m/2*7i  —  b: 


m 


A  solution  for  a  given  linear  system  is  an  n-tuple  (aq,  . . . ,  an)  in  Rn  which 
simultaneously  solves  each  equation  of  the  system.  The  collection  of  the  solutions  of 
the  system  X  is  then  a  subset  of  W ,  denoted  by  and  called  the  space  of  solutions 
of  X. 

A  system  X  is  called  compatible  or  solvable  if  its  space  of  solutions  is  non  void, 
7^  0;  it  will  be  said  to  be  incompatible  if  =0. 

Exercise  6.1.4  The  element  (1,-1)  e  M2  is  a  solution  of  the  system 


{*  +  y  =  0 
\x  -y  =  2  ' 

The  following  system 

f  v  +  y  =  0 
{  *  +  y  =  1 


has  no  solutions. 


In  the  present  chapter  we  study  conditions  under  which  a  linear  system  is  com¬ 
patible  and  in  such  a  case  find  methods  to  determine  its  space  of  solutions.  We  shall 
make  a  systematic  use  of  the  matrix  formalism  described  in  the  previous  Chaps.  4 
and  5. 


Definition  6.1.5  There  are  two  matrices  naturally  associated  to  the  linear  system  X 
as  given  in  the  Definition  6.1.3: 

1.  the  matrix  of  the  coefficients  of  X,  A  =  ( aij )  e  Mm,w, 

2.  the  matrix  of  the  inhomogeneous  terms  of  X,  B  =  t(b\, . . . ,  bm)  e  M/77,1. 

The  complete  or  augmented  matrix  of  the  linear  system  X  is  given  by 


(A,  B)  =  0 0,7  I  bi)  = 


^  011  012  •  •  •  01« 

hi  \ 

021  022  •  • •  02 n 

hi 

\0ml  0m  2  •  •  •  0m  n 

bm  / 

By  using  these  matrices  the  system  X  can  be  represented  as  follows 
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£  : 

*  #11  012  •  •  •  0i«  ^ 
021  022  • •  •  02 n 

/  X\  \ 

x2 

fbA 

62 

\ami  am 2  . . .  amn  j 

\xn  J 

\bm  / 

or  more  succinctly  as 

£  :  AX  =  B 


where  the  array  of  unknowns  is  written  as  X  =  7  (x\ ,  . . . ,  xn)  and  (abusing  notations) 
thought  to  be  an  element  in  M77,1. 

Definition  6.1.6  Two  linear  systems  £  :  AX  =  B  and  £7  :  A'X  =  B'  are  called 
equivalent  if  their  spaces  of  solutions  coincide,  that  is  £  ~  £7  if  =  S^.  Notice 
that  the  vector  of  unknowns  for  the  two  systems  is  the  same. 

Remark  6.1.7  The  linear  systems  AX  =  B  and  A'X  =  Bf  are  trivially  equivalent 

•  if  (A7,  B')  results  from  (A,  B)  by  adding  null  rows, 

•  if  (A7,  B')  is  given  by  a  row  permutation  of  (A,  B). 

The  following  linear  systems  are  evidently  equivalent: 


x  +  y  =  0 
x - y  =  2  ’ 


x-y  —  2 

X  +  y  =  0 


Remark  6.1.8  Notice  that  for  a  permutation  of  the  columns  of  the  matrix  of  its 
coefficients  a  linear  system  £  changes  to  a  system  that  is  in  general  not  equivalent 
to  the  starting  one.  As  an  example,  consider  the  compatible  linear  system  AX  =  B 
given  in  Exercise  6. 1.4.  If  the  columns  of  A  are  swapped  one  has 


(A,  B) 


11  0\ 
1—1  2  J 


Ci^C2 


> 


/  1  1  0 
\-l  1  2 


(Ar,  B). 


One  checks  that  the  solution  (1,  —1)  of  the  starting  system  is  not  a  solution  for 
the  system  A'X  =  B. 


6.2  The  Space  of  Solutions  for  Reduced  Systems 

Definition  6.2.1  A  linear  system  AX  =  B  is  called  reduced  if  the  matrix  A  of  its 
coefficients  is  reduced  by  rows  in  the  sense  of  Sect.  4.4.  Solving  a  reduced  system  is 
quite  elementary,  as  the  following  exercises  show. 

Exercise  6.2.2  Let  the  linear  system  £  be  given  by 
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Ix  +  y  +  2z  =  4 

y  —  2z  =  —  3  with  (A,  B) 

z  =  2 


/I  1  2 
0  1-2 
\00  1 


It  is  reduced,  and  has  the  only  solution  (x,  y ,  z)  =  (  —  1 ,  1,2).  This  is  easily  found 
by  noticing  that  the  third  equation  gives  z  =  2.  By  inserting  this  value  into  the  second 
equation  one  has  y  =  1 ,  and  by  inserting  both  these  values  into  the  first  equation  one 
eventually  gets  x  =  —  1 . 

Exercise  6.2.3  To  solve  the  linear  system 


|2x  -\~  y  2z  -\~  t  —  1 
2x  +  3y  —  z  =  3  with  (A,  B) 
x  +  z  =  0 


/2  1  2  1 
2  3-10 
\10  1  0 


1 

3 

0 


one  proceeds  as  in  the  previous  exercise.  The  last  equation  gives  z  =  —  x.  By  setting 
x  =  r,  one  gets  the  solutions  (x,  y,  z,  t)  =  (r,  —  r  +  1,  —  r,  r)  with  r  e  R.  Clearly 
X  has  an  infinite  number  of  solutions:  the  space  of  solutions  for  X  is  bijective  to 
elements  rei 

Exercise  6.2.4  The  linear  system  X  :  AX  =  B,  with 


(A,  B) 


/I  2  1 

3  \ 

0-12 

1 

0  0  3 

2 

\0  0  0 

1/ 

is  trivially  not  compatible  since  the  last  equation  would  give  0=1. 

Remark  6.2.5  If  A  is  reduced  by  row,  the  Exercises  6.2.2  and  6.2.3  show  that  one  first 
determines  the  value  of  the  unknown  corresponding  to  the  pivot  (special)  element 
of  the  bottom  row  and  then  replaces  such  unknown  by  its  value  in  the  remaining 
equations.  This  amounts  to  delete ,  or  eliminate  one  of  the  unknowns.  Upon  iterating 
this  procedure  one  completely  solves  the  system.  This  procedure  is  showed  in  the 
following  displays  where  the  pivot  elements  are  bold  typed: 


(A,  B) 


/I  1  2 

4  \ 

0  1-2 

-3 

o 

o 

2  / 

Here  one  determines  z  at  first  then  y  and  finally  x.  As  for  the  Exercise  6.2.3,  one 


writes 


(A,  B) 


(2121 
23-10 
\10  1  0 


1 

3 

0 


where  one  determines  z,  then  y  and  after  those  one  determines  t. 
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The  previous  exercises  suggest  the  following  method  that  we  describe  as  a  propo¬ 
sition. 

Proposition  6.2.6  (The  method  of  eliminations)  Let  £  :  AX  =  B  be  a  reduced  sys¬ 
tem. 

(1)  From  the  Remark  6.1. 7  we  may  assume  that  ( A ,  B )  has  no  null  rows. 

(2)  If  A  has  null  rows  they  correspond  to  equations  like  0  =  bi  with  bi  7^  0  since 
the  augmented  matrix  (A,  B)  has  no  null  rows.  This  means  that  the  system  is  not 
compatible,  S £  =  0. 

(3)  If  A  has  no  null  rows,  then  m  <  n.  Since  A  is  reduced,  it  has  m  pivot  elements,  so 
its  rank  is  m.  Starting  from  the  bottom  row  one  can  then  determine  the  unknown 
corresponding  to  the  pivot  element  and  then,  by  substituting  such  an  unknown 
in  the  remaining  equations,  iterate  the  procedure  thus  determining  the  space  of 
solutions. 

We  describe  the  general  procedure  when  A  is  a  complete  upper  triangular  matrix. 


a\\  a\2  $13  . . .  a\m  *  ...  *  a\n 

bi  \ 

0  $22  $23  •  •  •  $2 m  *  ...  *  $2« 

62 

(. A,B)  = 

0  0  $33  .  .  .  $3m  *  ...  *  $3n 

b3 

0  0  0  . . .  amm  ^  ...  ^  amn 

bm  / 

with  all  diagonal  elements  an  7^  0.  The  equation  corresponding  to  the  bottom  line 
of  the  matrix  is 


T  ^mm+Am+1  T  ’  ’  ’  T  amnXn  —  bm 

with  amm  7^  0.  By  dividing  both  sides  of  the  equation  by  amm,  one  has 


Xm,  —  $mm  ^mm+l^m+1  T  $mn-^-n )  • 


Then  xm  is  a  function  of  xm+ 1,  . . . ,  xn.  From  the  ( m  —  l)-th  row  one  analogously 
obtains 


%m  —  1  —  $m— 1m— 1  ($m—l  $m— lm-^m  $m  —  lm+l-^m+1  T  $m  —  hi%n )  • 

By  replacing  xm  with  its  value  (as  a  function  of  xm+\,  . . . ,  xn)  previously  deter¬ 
mined,  one  writes  xm-\  as  a  function  of  the  last  unknowns  xm+\ ,  ...  ,xn.  The  natural 
iterations  of  this  process  leads  to  write  the  unknowns  xm-2,  xm-3 ,  . . . ,  x\  as  functions 
of  the  remaining  ones  xm+\,  . . . ,  xn. 

Remark  6.2.7  Since  the  m  unknowns  x\, ...  ,xm  can  be  expressed  as  functions  of 
the  remaining  ones,  the  n  —  m  unknowns  xm+\ ,  . . . ,  xn,  the  latter  are  said  to  be  free 
unknowns.  By  choosing  an  arbitrary  numerical  value  for  them,  xm+\  =  \\ ,  . . . ,  xn  = 
A n-m ,  with  A i  e  R,  one  obtains  a  solution,  since  the  matrix  A  is  reduced,  of  the  linear 
system.  This  allows  one  to  define  a  bijection 
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Rn~m  o  S? 

where  n  is  the  number  of  unknowns  of  X  and  m  =  rk(A).  One  usually  labels  this 
result  by  saying  that  the  linear  system  has  oou~1u  solutions. 


6.3  The  Space  of  Solutions  for  a  General  Linear  System 


One  of  the  possible  methods  to  solve  a  general  linear  system  AX  =  B  uses  the 
notions  of  row  reduction  for  a  matrix  as  described  in  Sect.  4.4.  From  the  definition 
at  the  beginning  of  that  section  one  has  the  following  proposition. 

Theorem  6.3.1  Let  X  :  AX  =  B  be  a  linear  system,  and  let  (A' ,  B ')  be  a  trans¬ 
formed  by  row  matrix  of  (A,  B).  The  linear  systems  X  and  the  transformed  one 
Xr  :  A'X  =  Bf  are  equivalent. 

Proof  We  denote  as  usual  A  =  (aq )  and  B  =  t(b\,  . . . ,  bm).  If  (A;,  Br)  is  obtained 
from  (A,  B)  under  a  type  ( e )  elementary  transformation,  the  claim  is  obvious  as  seen 
in  Remark  6.1.7.  If  ( A ',  B')  is  obtained  from  (A,  B)  under  a  type  (A)  transformation 
by  the  row  Ri  the  claim  follows  by  noticing  that,  for  any  A  ^  0,  the  linear  equation 

cti\X\  +  •  •  •  +  atnxn  =  bi 


is  equivalent  to  the  equation 


\anX\  +  •  •  •  +  Xainxn  —  Xbi. 

Let  now  (A\  B')  be  obtained  from  (A,  B)  via  a  type  ( D )  elementary  transforma¬ 
tion, 

Ri  i  ^  Ri  +  A  Rj 

with  j  7^  i .  To  be  definite  we  take  i  =  2  and  j  =  1 .  We  then  have 


(Ar,  Br)  = 


f  Ri  \ 
R2  +  \R\ 


\  Rm  / 


Let  us  assume  that  a  =  (oq ,  . . . ,  an)  is  a  solution  for  X,  that  is 


+  •  •  '  +  ClinCXn  —  bi 


for  any  i  =  1,  . . . ,  m.  That  all  but  the  second  equation  of  X;  are  solved  by  a  is 
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obvious;  it  remains  to  verify  whether  a  solves  also  the  second  equation  in  it  that  is, 
to  show  that 


(#21  +  +  •  •  •  +  ( CL2n  H"  A^t n)xn  —  ^2  +  Xb\. 


If  we  add  the  equation  for  i  =  2  to  A  times  the  equation  for  i  =  1 ,  we  obtain 

(#21  +  A<2n)cti  +  •  •  •  +  ( Cl2n  +  A^i  n)®n  =  ^2  +  AZ?i 


thus  (aq,  . . . ,  an)  is  a  solution  for  X;  and  S 2  c  The  inclusion  Ss/  c  is 
proven  in  an  analogous  way.  □ 

By  using  the  above  theorem  one  proves  a  general  method  to  solve  linear  systems 
known  as  Gauss'  elimination  method  or  Gauss'  algorithm. 


Theorem  6.3.2  The  space  of  the  solutions  of  the  linear  system  X  :  AX  =  B  is 

determined  via  the  following  steps. 

(1)  Reduce  by  rows  the  matrix  (A,  B )  to  (A\  Br)  with  Ar  reduced  by  row. 

(2)  Using  the  method  given  in  the  Proposition  6.2.6  determine  the  space  S jy  of  the 
solutions  for  the  system  X'  :  ArX  =  B' . 

(3)  From  the  Theorem  6.3.1  it  is  X  ~  X;  that  is  S ^  =  Sjy. 

Exercise  6.3.3  Let  us  solve  the  following  linear  system 

I2x  +  y  +  z  =  1 
x  —  y  —  z  =  0 
x  +  2y  +  2z  =  1 

whose  complete  matrix  is 


(A,  B) 


/ 2  1  1 
1-1-1 
\1  2  2 


By  reducing  such  a  matrix  by  rows,  we  have 


R-2  ^  ^2  +  ^1 

(A,B)  - 


^3  ^  ^3  2/?j 


/  2  1  1 
3  00 
\-3  0  0 


1 

1 

-1 


> 


R3  ^  R3+R2 


1 

1 

0 


(A',  Br). 


Since  A'  is  reduced  the  linear  system  X'  :  A'X  =  B'  is  reduced  and  then  solvable 
by  the  Gauss’  method.  We  have 
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2x  +  y  +  z  =  1 
3v  =  1 


y  +  z  = 


i 

3 


It  is  now  clear  that  one  unknown  is  free  so  the  linear  system  has  oo1  solutions. 
By  choosing  z  =  A  the  space  of  solutions  for  £  is 


Se  =  {(*,  y,  z)  e  M3 1  (x,  y,  z)  =  (|,  \  -  A,  A),  A  €  M}. 


On  the  other  end,  by  choosing  y  =  a  the  space  can  be  written  as 

Se  =  {(*,  y,  z)  e  M3 1  (x,  y,  z)  =  (|,  a,  |  -  a),  a  e  M}. 

It  is  obvious  that  we  are  representing  the  same  subset  S £  C  M3  in  two  different 
ways. 

Notice  that  the  number  of  free  unknowns  is  the  difference  between  the  total  number 
of  unknowns  and  the  rank  of  the  matrix  A. 


Exercise  6.3.4  Let  us  solve  the  following  linear  system, 


Ix  +  y - z  =  0 
2x  —  y  =  1 
y  -h2z  =  2 


whose  complete  matrix  is 


(A,B) 


( 1  1  -1 
2-10 

\0  1  2 


The  reduction  procedure  gives 


Rj  !— >■  R2~^R\ 

(A.B)  - 


(' 1  -> 

0-3  2 

\0  1  2 


0 

1 

2 


> 


R3  R3  —  R2 


('  1 

0-3  2 

\0  4  0 


(A\  Br). 


Since  Ar  is  reduced  the  linear  system  E;  :  A'X  =  B'  is  reduced  with  no  free 
unknowns.  This  means  that  Sjy  (and  then  S s)  has  oo°  =  1  solution.  The  Gauss’ 
method  provides  us  a  way  to  find  such  a  solution,  namely 


Iv  -  y  +  z  =  0 
— 3y  +  2z  =  1 
4y  =  1 
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r  i  a 

This  gives  ={(x,<y,z)  =  (|,|,^)}.  Once  more  the  number  of  free  unknowns 
is  the  difference  between  the  total  number  of  unknowns  and  the  rank  of  the  matrix  A. 


The  following  exercise  shows  how  to  solve  a  linear  system  with  one  coefficient 
given  by  a  real  parameter  instead  of  a  fixed  real  number.  By  solving  such  a  system  we 
mean  to  analyse  the  conditions  on  the  parameter  under  which  the  system  is  solvable 
and  to  provide  its  space  of  solutions  as  depending  on  the  possible  values  of  the 
parameter. 

Exercise  6.3.5  Let  us  study  the  following  linear  system, 


x  +  2  y  +  z  +  t  —  —  1 
v  +  y  —  z  +  2t  =  l 
2v  H-  Ay  H-  A t  —  0 
—Ay  —  2z  +  Af  =  2 


with  A  e  R.  When  the  complete  matrix  for  such  a  system  is  reduced,  particular  care 
must  be  taken  for  some  critical  values  of  A.  We  have 


(A,B) 


/ 1  2  11 

-l\ 

11-12 

1 

2  A  0  A 

0 

\0  -A  -2  A 

2  / 

R-2  t— >•  R2~R\ 


> 


R?>  ^  R?>—^R\ 


1  2 

1 

1 

-1  \ 

0  -1 

-2 

1 

2 

0  A-4 

-2  A 

-2 

2 

\0  -A 

-2 

A 

2  / 

R?>  i  *  R3—R2 


> 


R^^R4~  R2 


/ 1  2  1  1 

-1\ 

0-1-21 

2 

0  A  —  3  0  A  —  3 

0 

\0  -A+  1  0  A  -  1 

0  ) 

(A\  B '). 


The  transformations  R3  i->  R3  +  R4,  then  R3  1-^  ^3  and  finally  R4  \-+  R4  + 
(1  —  A)7?3  give  a  further  reduction  of  (A',  B')  as 


(\ 

2 

1 

1 

-L 

1 

(1 

2 

1  1 

-i\ 

0 

-1 

-2 

1 

2 

0 

-1 

-2  1 

2 

0 

-1 

0 

A  —  2 

0 

0 

-1 

0  A  —  2 

0 

\0 

-A+l 

0 

A-  1 

0  ) 

\o 

0 

0  <244 

0 ) 

(A",  B") 


with  CL44  =  (1  —  A)  (A  —  3).  Notice  that  the  last  transformation  is  meaningful  for  any 
A  g  R.  In  the  reduced  form  ( A ",  B")  we  have  that  R4  is  null  if  and  only  if  either 
A  =  3  or  A  =  1 .  For  such  values  of  the  parameter  A  either  7?3  or  R4  in  A'  is  indeed 
null.  We  can  now  conclude  that  XA  is  solvable  for  any  value  of  A  e  R  and  we  have 
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•  If  A  g  {1,3}  then  <244  =  0,  so  rk(A)  =  3  and  X^  has  00 1  solutions, 

•  If  A  ^  {1,3}  then  <244  7^  0,  so  rk(A)  =  4  and  X^  has  a  unique  solution. 

We  can  now  study  the  following  three  cases: 

(a)  A  ^  {1,3},  that  is 

x  +  2  y  +  z  +  t  —  —  1 
—y  -  2z  +  t  =  2 

A  :  I  -y  (A  -  2)t  =  0  ' 

(A  —  3)  (A  —  1  )t  =  0 

From  our  assumption,  we  have  that  <244  =  (A  —  3)  (A  —  1)  7^  0  so  we  get  t  =  0. 
By  using  the  Gauss’  method  we  then  write 

x  =  0 

z  =  -1 
^  y  =  0  ' 

t  =  0 


This  shows  that  for  A  7^  1,  3  the  space  S^x  does  not  depend  on  A. 

(b)  If  A  =  1  we  can  delete  the  fourth  equation  since  it  is  a  trivial  identity.  We  have 
then 

Ix  +  2y  +  z  ~b  t  =  —  1 

-y  ~  2z  +  t  =  2 
y  + 1  =  0 

The  Gauss’  method  gives  us 

Ix  =  0 
z  =  t  —  1 

y  =  -* 

and  this  set  of  solutions  can  be  written  as 

{(x,  y,  z,  t)  G  M4  |  (x,  y,  z,  t)  =  (0,  —a,  a  -  1,  a),  a  e  R}. 


(c)  If  A  =  3  the  non  trivial  part  of  the  system  turns  out  to  be 

Ix  +  2y  +  z  +  t  =  —  1 

-y  -  2z  + 1  =  2 
-y  +  t  =  0 

and  we  write  the  solutions  as 

Ix  =  —3 1 
z  =  -l 

y  =  t 

or  equivalently  S^x=3  =  {(x,  y,  z,  t)  G  M4  |  (x,  y,  z,  t)  =  (—3a,  a,  —1,  a),  a  g  R}. 
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What  we  have  discussed  can  be  given  in  the  form  of  the  following  theorem  which 
provides  general  conditions  under  which  a  linear  system  is  solvable. 

Theorem  6.3.6  (Rouche-Capelli).  The  linear  system  £  :  AX  =  B  is  solvable  if  and 
only  ifrk(A)  =  rk (A,  B ).  In  such  a  case,  denoting  rk(A)  =  rk(A,  B)  =  p  and  with 
n  the  number  of  unknowns  in  £,  the  following  holds  true: 

(a)  the  number  of  free  unknowns  is  n  —  p, 

(b)  the  n  —  pfree  unknowns  have  to  be  selected  in  such  a  way  that  the  remaining  p 
unknowns  correspond  to  linearly  independent  columns  of  A. 

Proof  By  noticing  that  the  linear  system  £  can  be  written  as 


x\C\  +  •  •  •  +  xnCn  —  B 


with  C\,  ...  ,Cn  the  columns  of  A,  we  see  that  £  is  solvable  if  and  only  if  B  is  a  linear 
combination  of  these  columns  that  is  if  and  only  if  the  linear  span  of  the  columns  of 
A  coincides  with  the  linear  span  of  the  columns  of  (A,  B).  This  condition  is  fulfilled 
if  and  only  if  rk(A)  =  rk(A,  B). 

Suppose  then  that  the  system  is  solvable. 

(a)  Let  £'  :  A'X  =  B '  be  the  system  obtained  from  (A,  B)  by  reduction  by  rows. 
From  the  Remark  6.2.7  the  system  £;  has  n  —  rk(A')  free  unknowns.  Since 
£  ~  £;  and  rk(A)  =  rk(A;)  the  claim  follows. 

(b)  Possibly  with  a  swap  of  the  columns  in  A  =  (Ci,  . . . ,  Cn)  (which  amounts  to 
renaming  the  unknown),  the  result  that  we  aim  to  prove  is  the  following: 

xp+y  ,  . . . ,  xn  are  free  C\ ,  . . . ,  Cp  are  linearly  independent. 

Let  us  at  first  suppose  that  C\,...,CP  are  linearly  independent,  and  set 
A  =  (Ci,  ... ,  Cp).  By  a  possible  reduction  and  a  swapping  of  some  equations, 
with  rk(A)  =  rk(A,  B)  =  p,  the  matrix  for  the  system  can  be  written  as 

^ a\\  a\ 2  $13  . . .  a\p  *  ...  *  b\  \ 

0  $22  a23  •  •  •  a2p  *  •  •  •  *  ^2 

t33  •  •  •  &3 p  *  •  •  •  *  ^3 

0  . . .  app  *  . . .  *  bp 

0  ...  0  0  ...  0  0 

0  ...  0  0  ...  0  0  / 

The  claim — that  xp+\, . . . ,  xn  can  be  taken  to  be  free — follows  easily  from  the 
Gauss’  method. 

On  the  other  hand,  let  us  assume  that  xp+\, . . . ,  xn  are  free  unknowns  for  the 
linear  system  and  let  us  also  suppose  that  C\,  ,  Cp  are  linearly  dependent.  This 


(Ar,  Br)  = 


0  0 

0  0 
0  0 


V  o  0 
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would  result  in  the  rank  of  A  be  less  that  p  and  there  would  exist  a  reduction  of 
(A,  B)  for  which  the  matrix  of  the  linear  system  turns  out  to  be 


/ 


(A',  B')  = 


CL\\  .  .  .  CL\p 


\ 


Ct  p — j[  2  •  •  •  CL  p — ^  p  ^ 


o 


V  o 


p-ip 

0  *  . . .  * 


0  *...*/ 


Since  rk(A',  Bf)  =  rk (A,  B)  =  p  there  would  then  be  a  non  zero  row  R;  in 
(A',  B')  with  i  >  p.  The  equation  corresponding  to  such  an  not  depending 
on  the  first  p  unknowns,  would  provide  a  relation  among  the  xp+\ , ,xn,  which 
would  then  be  not  free.  □ 


Remark  6.3.7  If  the  linear  system  E  :  AX  =  B,  with  n  unknowns  and  m  equations 
is  solvable  with  rk(A)  =  p,  then 

(i)  E  is  equivalent  to  a  linear  system  E;  with  p  equations  arbitrarily  chosen  among 
the  m  equations  in  E ,  provided  they  are  linearly  independent. 

(ii)  there  is  a  bijection  between  the  space  S 2  and  Ru~p. 

Exercise  6.3.8  Let  us  solve  the  following  linear  system  depending  on  a  parameter 

A  e  R, 

I\x  +  z  =  —1 
X  +  (A  -  l)y  +  2z  =  1  . 
x  +  (A  —  l)y  +  3z  =  0 


We  reduce  by  rows  the  complete  matrix  corresponding  to  E  as 


(A,  B)  = 


R2  1  >■  R2—  2R[ 


R3  ^  ^3  — 3Z?i 


A  0  1 

1  -  2A  A  -  1  0 
1  -3A  A  -  1  0 


Rl  ^  R3—R2 


A  0  1 

1  -  2A  A  -  1  0 
-A  0  0 


=  (A',  B') 


Depending  on  the  values  of  the  parameter  A  we  have  the  following  cases. 
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(a)  If  A  =  1,  the  matrix  A '  is  not  reduced.  We  then  write 


(A',  Br) 


/  1  0  1 
-10  0 
\-l  0  0 


> 


Rj,  i->  R3—R2 


/  1  0  1 
-10  0 
\0  00 


The  last  row  gives  the  equation  0  =  —  3  and  in  this  case  the  system  has  no 
solution. 

(b)  If  A  7^  1  the  matrix  A'  is  reduced,  so  we  have: 

•  If  A  7^  0,  then  rk(A)  =  3  =  rk(A,  B ),  so  the  linear  system  Xa=o  has  a  unique 
solution.  With  A  ^  {0,  1}  the  reduced  system  is 


I  Ax  +  z  =  —  1 
(1  -  2A)x  +  (A  —  l)y  =  3 
—Ax  =  0 

and  the  Gauss’  method  gives  S^x  =  (x,  y,  z)  =  (0,  3/(A  —  1),  — 1). 
•  If  A  =  0  the  system  we  have  to  solve  is 


whose  solutions  are  given  as 

=  {(x,  y,  z)  e  M3 1  (x,  y,  z)  =  (a  +  3,  a,  -1)  a  e  M} . 

Exercise  6.3.9  Let  us  show  that  the  following  system  of  vectors, 

n1  =  (1,1,0),  V2  =  (0,  1,  1),  V3  =  (1,  0,  1), 

is  free  and  then  write  v  =  (1,1,1)  as  a  linear  combination  of  v\,  V2,  v$. 

We  start  by  recalling  that  iq ,  r»2,  V3  are  linearly  independent  if  and  only  if  the  rank 
of  the  matrix  whose  columns  are  the  vectors  themselves  is  3.  We  have  the  following 
reduction, 


1  0  1 

(vi  v2  v3)  =  |  110 

0  1  1 


1-^ 


1-^ 


The  number  of  non  zero  rows  of  the  reduced  matrix  is  3  so  the  vectors  v\ ,  r>2, 
are  linearly  independent.  Then  they  are  a  basis  for  M3,  so  the  following  relation, 


xiq  +  yv  2  +  ZV3  =  v 
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is  fullfilled  by  a  unique  triple  (x,  y ,  z)  of  coefficients  for  any  v  e  M3.  Such  a  triple  is 
the  unique  solution  of  the  linear  system  whose  complete  matrix  is 
(A,  B)  =  (tq  V2  V3  v).  For  the  case  we  are  considering  in  this  exercise  we  have 


(A,  B) 


/ 1  0  1 
110 
\0  1  1 


Using  for  (A,  B)  the  same  reduction  we  used  above  for  A  we  have 


(A,  B )  i-> 


/I  0  1 
0  1-1 
\0  1  1 


/I  0  1 
0  1-1 
\00  2 


The  linear  system  we  have  then  to  solve  is 


Ix  +  z= 1 

y  -  z  =  0 
2z  =  1 


giving  (x,  y,  x)  =  j(l ,  1,  1).  One  can  indeed  directly  compute  that 

i(l,  1,0) +  i(0,l,l)  +  i(l,0,l)  =  (1,1,1). 

Exercise  6.3.10  Let  us  consider  the  matrix 


with  A  e  R.  We  compute  its  inverse  using  the  theory  of  linear  systems. 
We  can  indeed  write  the  problem  in  terms  of  the  linear  system 


that  is 


Av  +  z  =  1 
v  +  Az  =  0 
Ay  +  t  =  0 

y  +  \t  =  1 


We  reduce  the  complete  matrix  of  the  linear  system  as  follows: 
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(A,B) 


/A  0  1  0 

1\ 

1  0  A  0 

0 

0  A  0  1 

0 

\0  1  0  A 

V 

A  0  10 

1  \ 

R2  1 — ^  R2 — A R\ 

1  -  A2  0  0  0 

-A 

- A 

0  AO  1 

0 

O 

i— 1 

O 

1  / 

> 


R4  I — >  —  A Rt, 


/A  0  10 

1  \ 

1  -  A2  0  0  0 

-A 

0  A  0  1 

0 

\  0  1  -  A2  0  0 

1  / 

The  elementary  transformations  we  used  are  well  defined  for  any  real  value  of  A. 
We  start  by  noticing  that  if  1  —  A2  =  0  that  is  A  =  zb  1 ,  we  have 


(A',  Br) 


Th 

1 — * 

0 

1 — * 

0 

1  \ 

0  000 

Tl 

0  ±1 0 1 

0 

\  0  000 

1  / 

The  second  and  the  fourth  rows  of  this  matrix  show  that  the  corresponding  linear 
system  is  incompatible.  This  means  that  when  A  =  ±  1  the  matrix  M\  is  not  invertible 
(as  we  would  immediately  see  by  computing  its  determinant). 

We  assume  next  that  1  —  A2  7^  0.  In  such  a  case  we  have  rk(A)  =  rk(A,  B)  =  4, 
so  there  exists  a  unique  solution  for  the  linear  system.  We  write  it  in  the  reduced 
form  as 

Av  +  z  =  1 
,  .  1(1-  \2)x  =  -A 
'  j  Ay  t  =  0 
(1  -  A2)y  =  1 


Its  solution  is  then 

z  =  1/(1  -  A2) 
x  =  — A/(l  -  A2) 
t  =  — A/(l  -  A2) 
y  =  1/0 -A2) 


that  we  write  in  matrix  form  as 


1 

(l-A2) 
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6.4  Homogeneous  Linear  Systems 

We  analyse  now  an  interesting  class  of  linear  systems  (for  easy  of  notation  we  write 

0  =  O^m). 


Definition  6.4.1  A  linear  system  X  :  AX  =  B  is  called  homogeneous  if  B  =  0. 

Remark  6.4.2  A  linear  system  X  :  AX  =  0  with  A  e  Mm,n  is  always  solvable  since 
the  null  n -tuple  (the  null  vector  in  R”)  gives  a  solution  for  X,  albeit  a  trivial 
one.  This  also  follows  form  the  Rouche-Capelli  theorem  since  one  obviously  has 
rk(A)  =  rk(A,  0).  The  same  theorem  allows  one  to  conclude  that  such  a  trivial  solu¬ 
tion  is  indeed  the  only  solution  for  X  if  and  only  if  n  =  p  =  rk(A). 

Theorem  6.4.3  Let  X  :  AX  =  0  be  a  homogeneous  linear  system  with  A  e  Rm,n. 
Then  is  a  vector  subspace  ofW1  with  dim  =  n  —  rk(A). 

Proof  From  the  Proposition  2.2.2  we  have  to  show  that  if  X\,  X2  e  with  Ai, 
A2  e  R,  then  A1X1  +  X2X2  is  in  S £.  Since  by  hypothesis  we  have  AX  1  =  0  and 
AX2  =  0we  have  also  Ai(AXi)  +  \2(AX2)  =  0.  From  the  properties  of  the  matrix 
calculus  we  have  in  turn  Ai(AXi)  +  A2 (AX2)  =  A(X\X\  +  X2X2),  thus  giving 
A1X1  +  X2X2  in  S x;.  We  conclude  that  is  a  vector  subspace  of  R”. 

With  p  =  rk(A),  from  the  Rouche-Capelli  theorem  we  know  that  X  has  n  —  p 
free  unknowns.  This  number  coincides  with  the  dimension  of  S^.  To  show  this  fact 
we  determine  a  basis  made  up  of  n  —  p  elements.  Let  us  assume  for  simplicity  that 
the  free  unknowns  are  the  last  ones  xp+\,  . . . ,  xn.  Any  solution  of  X  can  then  be 
written  as 

(*5  %p+ 1  ?•••!>  %n) 

where  the  p  symbols  *  stand  for  the  values  ofx\,...,xp  corresponding  to  each  possi¬ 
ble  value  of  xp+\ , ...  ,xn.  We  let  now  the  ( n  —  p) -dimensional  ‘vector’  xp+\ ,  . . . ,  xn 
range  over  all  elements  of  the  canonical  basis  of  M77  -/9  and  write  the  corresponding 
elements  in  as 


v\  1,  0,  0) 

v2  0,  1, . . . ,  0) 

vn-p  =  0,  0,  1). 

The  rank  of  the  matrix  (v\,  ... ,  vn-p)  (that  is  the  matrix  whose  rows  are  these  vec¬ 
tors)  is  clearly  equal  to  n  —  p,  since  its  last  n  —  p  columns  are  linearly  independent. 
This  means  that  its  rows,  the  vectors  v\ ,  . . . ,  vn-p,  are  linearly  independent.  It  is  easy 
to  see  that  such  rows  generate  S ^  so  they  are  a  basis  for  it  and  dim(Ss)  =  n  —  p.  □ 

It  is  clear  that  the  general  reduction  procedure  allows  one  to  solve  any  homoge¬ 
neous  linear  system  X .  Since  the  space  is  in  this  case  a  linear  space,  one  can 
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determine  a  basis  for  it.  The  proof  of  the  previous  theorem  provides  indeed  an  easy 
method  to  get  such  a  basis  for  S £ .  Once  the  elements  in  are  written  in  terms  of 
the  n  —  p  free  unknowns  a  basis  for  S s  is  given  by  fixing  for  these  unknowns  the 
values  corresponding  to  the  elements  of  the  canonical  basis  in  R7?_p. 

Exercise  6.4.4  Let  us  solve  the  following  homogeneous  linear  system, 


Ix\  —  2x3  +  V5  +  xe  =  0 
X\  —  X2  —  X3  +  X4  —  X5  +  *6  =  0 
Xl  —  X2  +  2x4  —  2X5  +  2X6  =  0 


and  let  us  determine  a  basis  for  its  space  of  solutions.  The  corresponding  A  matrix  is 

/I  0  -20  1  1\ 

A  =  [  1—1—11—11  J  . 

\1  -1  0  2  -2  2/ 

We  reduce  it  as  follows 

/I  0  -2  0  1  1\  / 1  0  —20  1  1  \ 

A  i->  0-111-20  i->  [  0  —1  1  1  — 2  0  I  =  A'. 

\0-l  2  2-31/  \0  0  11-11/ 

Thus  rk(A)  =  rk (A')  =  3.  Since  the  first  three  rows  in  A!  (and  then  in  A)  are 
linearly  independent  we  choose  X4,  X5,  X6  to  be  the  free  unknowns.  One  clearly  has 
X  ~  X7  :  A'X  =  0  so  we  can  solve 


1X1  —  2x3  +  X5  +  X6  =  0 
X2  —  X3  —  X4  +  2x5  =  0  • 
X3  +  X4  —  X5  +  X6  =  0 


By  setting  X4  =  a  ,  X5  =  b  and  X(3  =  c  we  have 

=  {(xi,  X6)  =  (—2 a  +  b  —  3c,  —b  —  c,  —a  +  b  —  c,  a,  b,  c)  \  a,  b,  c  e  R}. 

To  determine  a  basis  for  we  let  (a,  b,  c)  be  the  vectors  (1,  0,  0),  (0,  1,0),  (0,  0,  1) 
of  the  canonical  basis  in  R3  since  n  —  p  =  6  —  3  =  3.  With  this  choice  we  get  the 
following  basis 


n1  =  (-2,  0,-1,  1,0,0) 
n2  =  (1,-1,  1,0,  1,0) 
v3  =  (-3,  -1,  -1,  0,  0,  1). 


Chapter  7 

Linear  Transformations 


® 

Check  for 
updates 


Together  with  the  theory  of  linear  equations  and  matrices,  the  notion  of  linear 
transformations  is  crucial  in  both  classical  and  quantum  physics.  In  this  chapter 
we  introduce  them  and  study  their  main  properties. 


7.1  Linear  Transformations  and  Matrices 

We  have  already  seen  that  differently  looking  sets  may  have  the  same  vector  space 
structure.  In  this  chapter  we  study  mappings  between  vector  spaces  which  are,  in  a 
proper  sense,  compatible  with  the  vector  space  structure.  The  action  of  such  maps 
will  be  represented  by  matrices. 

Example  7.1.1  Let  A  =  ^  ^  e  M2,2.  Let  us  define  the  map  /  :  M2  — >  M2  by 

/(X)  =  AX 

where  X  =  (x,  y)  is  a  (column)  vector  representing  a  generic  element  in  M2  and  AX 
denotes  the  usual  row  by  column  product,  that  is 


With  X  =  (x\,  X2)  and  Y  =  (yi,  y2)  two  elements  in  M2,  using  the  properties  of 
the  matrix  calculus  it  is  easy  to  show  that 

/(X  +  7)  =  A(X  +  F)  =  AX  +  AY  =  /(X)  +  f(Y) 

©  Springer  International  Publishing  AG,  part  of  Springer  Nature  2018 
G.  Landi  and  A.  Zampini,  Linear  Algebra  and  Analytic  Geometry 
for  Physical  Sciences ,  Undergraduate  Lecture  Notes  in  Physics, 
https://doi.org/10.1007/978-3-319-78361-l_7 
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as  well  as,  with  A  g  R,  that 

/(AX)  =  A(AX)  =  AA(X)  =  A/ (X). 

This  example  is  easily  generalised  to  matrices  of  arbitrary  dimensions. 
Exercise  7.1.2  Given  A  =  ( atj )  g  M772  ’77  one  considers  the  map  /  :  M77  M777 

/(X)  =  AX, 


with  X  =  t(x\,  . . . ,  xn)  and  AX  the  usual  row  by  column  product.  The  above 
properties  are  easily  generalised  so  this  map  satisfies  the  identities  /(X  +  F)  = 
/(X)  +  /(F)  for  any  XJgM77  and  /(AX)  =  A/(X)  for  any  X  g  M77,  A  6  R. 


Example  7.1.3  Let  A  = 
given  by 


1  2  1 
1  -1  0 


G  M2,3.  The  associated  map  /  :  M3  — >  M2  is 


f((x,y,z ))  = 


The  above  lines  motivate  the  following. 

Definition  7.1.4  Let  V  and  IT  be  two  vector  spaces  over  R.  A  map  f  :  V  ^  W 
is  called  linear  if  the  following  properties  hold: 

(LI)  /(X  +  Y)  =  /(X)  +  / (Y)  for  all  X,  F  e  V  , 

(L2)  /(AX)  =  A/(X)  for  all  X  e  V,  A  e  R  . 

The  proof  of  the  following  identities  is  immediate. 

Proposition  7.1.5  If  f  :  V  — >  W  is  a  linear  map  then, 

(a)  f(0v)  =  0w, 

(b)  f(-v)  =  —  f  (v)  for  any  v  e  V, 

(c)  f(a\V\  +  •  •  •  +  apvp)  =  a\f(v\)  +  •  •  •  +  apf(vp),forany  v\,  . . . ,  vp  e  V  and 
a\, ...  ,ap  g  R. 

Proof  (a)  Since  Oy  =  O^Oy  the  (L2)  defining  property  gives 

/  (Oy)  =  /  (O^Oy)  =  Or/ (Oy)  =  O^y. 

(b)  Since  —  v  =  (— l)u,  again  from  (L2)  we  have 


f(-v)  =  /«- m  =  (-i  )f(v)  =  -f(v). 


(c)  This  is  proved  by  induction  on  p.  If  p  =  2  the  claim  follows  directly  from  (LI) 
and  (L2)  with 
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f(aivi  +  a2v2)  =  +  f(a2v2)  =  tfi/(ui)  +  a2f{v2). 

Let  us  assume  it  to  be  true  for  p  —  1.  By  setting  w  =  a\V\  +  •  •  •  +  ap- \vp-\, 
we  have 

H - h  apvp)  =  f(w  +  apvp)  =  f(w)  +  f(apvp)  =  /O)  +  apf(vp) 

(the  first  equality  follows  from  (LI),  the  second  from  (L2)).  From  the  induction 
hypothesis,  we  have  f(w)  =  a\f{v i)  +  •  •  •  +  ap-\f(vp-\),  so 

f(aiv i  H - h  apvp )  =  /(w)  +  apf(vp )  =  ai/(vi)  H - h  0p-i/(vp-i)  +  apf(vp), 


which  is  the  statement  for  p. 

□ 

Example  7.1.6  The  Example  7.1.1  and  the  Exercise  7.1.2  show  how  one  associates 
a  linear  map  between  W1  and  Mm  to  a  matrix  A  e  Mm,n.  This  construction  can  be 
generalised  by  using  bases  for  vector  spaces  V  and  W. 

Let  us  consider  a  basis  B  =  (v  ,vn)  for  V  and  a  basis  C  =  (uq,  . . . ,  wm) 
for  W.  Given  the  matrix  A  =  (atj)  e  Mm,w  we  define  f  :  V  ^  W  as  follows.  For 
any  v  e  V  we  have  uniquely  u  =  .xq  iq  +  •  •  •  +  xn  vn,  that  is  v  =  (x\,  . . . ,  xn)&.  With 
X  =  r  (jci,  . . . ,  xn),  we  consider  the  vector  AX  e  with  AX  =  1  (yi , . . . ,  ym)c-  We 
write  then 

/(u)  =  y\w\  H - h  ym^m 


which  can  be  written  as 


f  (  (^  1 5  ■  •  ■  5  )  £> ) 


A 


\x«7  7 


c 


Exercise  7.1.7  Let  us  consider  the  matrix  A  =  ^  i  Oy  G  ^2,3>  with  ^  =  R|W]2 

and  W  =  R[X]i.  With  respect  to  the  bases  B  =  (1,  X,  X2)  for  V  and  C  =  (1,  X)  for 
W  the  map  corresponding  to  A  as  in  the  previous  example  is 

f(a  +  bX  +  cX2)  =  (At(ai  b,  c))c 


that  is 


f  (a  H-  bX  4-  cX2)  —  ( a  H-  2 b  H-  c,  a  —  ^)c  —  a  -\-  2 b  H-  c  T  ( a  —  E)X. 

Proposition  7.1.8  The  map  f  :  V  —>  W  defined  in  the  Example  7.1.6  is  linear. 

Proof  Let  v,  v'  e  V  with  v  =  (x\,  ... ,  xn)s  and  v'  =  (x[,  . . . ,  x'n)s-  From  the 
Remark  2.4.16  we  have 
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v  +  v'  =  (xi  +  x[,  . . .  ,xn  +x'u)b 

so  we  get 

f(v  +  v')  =  (A\x  1  +  x[,...,xn  +x'n))c 

=  (A\x i,  . . . ,  xn))c  +  ,  *'))c 

=  f(v)  +  fW) 

(notice  that  the  second  equality  follows  from  the  Proposition  4.1.10).  Along  the  same 
line  one  shows  easily  that  for  any  A  e  R  one  has  f(\v)  =  \f(v).  □ 

The  following  definition  (a  rephrasing  of  Example  7.1.6)  plays  a  central  role  in 
the  theory  of  linear  transformations. 

Definition  7.1.9  With  V  and  W  two  vector  spaces  over  R  and  bases  B  =  (tq, . . . ,  vn) 
for  V  and  C  =  (w i,  . . . ,  wm)  for  W ,  consider  a  matrix  A  =  ( aij )  g  Mm,n.  The  linear 
map 

fCAB  :  V  ->  W 


defined  by 

V  3  V  =  X\V\  +  •  •  •  +  Xnvn  =  yi  Wl  +  •  •  •  +  ^  ^ 

with 

f(ji,  ...,ym)  =  A'(x  i, . . 

is  the  linear  map  corresponding  to  the  matrix  A  with  respect  to  the  basis  B  e  C. 

Remark  7.1.10  Denoting  f A  ’  =  /,  one  immediately  sees  that  the  n  columns  in  A 

provide  the  components  with  respect  to  C  in  W  of  the  vectors 
with  (v\, ... ,  vn)  the  basis  B  for  V.  One  has 


v\  —  lrq  +  0i>2  +  •  •  •  +  0vn  —  (1,  0, . . . ,  0)#, 


thus  giving 


f(v l)  =  (Ar(l,  0, . . . ,  0))c  ='(flii . a„i)c 


=  f(v i)  =  auwi  H - \-amlwm 


It  is  straightforward  now  to  show  that  f(vj)  =  (« a\j ,  . . . ,  amj)c  for  any  index  j . 
If  A  =  (aij)  e  Mm,n  and  /  :  W1  Mm  is  the  linear  map  defined  by  f(X)  =  AX, 

then  the  columns  of  A  give  the  images  under  /  of  the  vectors  (e\,  . . .  en)  of  the 
canonical  basis  £n  in  M77 .  This  can  be  written  as 


A  =  {fie i)  f(e2)  •••/(«„)). 
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Exercise  7.1.11  Let  us  consider  the  matrix 


with  B  =  C  =  £ 3  the  canonical  basis  in  M3,  and  the  corresponding  linear  map 
f  =  Ia3,83  :  ^3-  If  (x,y,z)^R3  then  f((x,y,z))  =  At(x,y,z).  The 

action  of  /  is  then  given  by 


/((*,  y,  z))  =  (x  +  y  -  z,  y  +  2 z,  x  +  y). 
Being  B  the  canonical  basis,  it  is  also 


f(el)  =  (1,0,1),  f(e2)  =  (  1,1,1),  f(e3)  =  (-1,2,0). 


We  see  that  f(e2),  f(e^)  are  the  columns  of  A.  This  is  not  an  accident: 

as  mentioned  the  columns  of  A  are,  in  the  general  situation,  the  components  of 
f(e i),  f(e 2),  f  (^3)  with  respect  to  a  basis  C — in  this  case  the  canonical  one. 

The  Proposition  7.1.8  shows  that,  given  a  matrix  A,  the  map  f A  ’  is  linear.  Our 
aim  is  now  to  prove  that  for  any  linear  map  /  :  V  — >  W  there  exists  a  matrix  A  such 

v?  r> 

that  /  =  fA'  ,  with  respect  to  two  given  bases  B  and  C  for  V  and  W  respectively. 

In  order  to  determine  such  a  matrix  we  use  the  Remark  7.1.10:  given  a  matrix  A 
the  images  under  f  A  ’  of  the  elements  in  the  basis  B  of  V  are  given  by  the  column 
elements  in  A.  This  suggests  the  following  definition. 

Definition  7.1.12  Let  B  =  (v\, . . . ,  vn)  be  a  basis  for  the  real  vector  space  V  and 
C  =  (w\ , . . . ,  wm)  a  basis  for  the  real  vector  space  W.  Let  /  :  V  — >  W  be  a  linear 
map.  The  matrix  associated  to  /  with  respect  to  the  basis  B  and  C,  that  we  denote 
by  My’  ,  is  the  element  in  Rm,n  whose  columns  are  given  by  the  components  with 
respect  to  C  of  the  images  under  /  of  the  basis  elements  in  B.  That  is,  the  matrix 
MCfB  =  A  =  ( atj )  is  given  by 


/ (tfi)  —  011^1  +  •  •  •  +  am\  wm 


f  (Vn)  ~  CtlnWl  +  ’  •  •  +  amn  wm , 


which  can  be  equivalently  written  as 

MC/B  =  (/(uj).  ...  ,/(«„)). 

Such  a  definition  inverts  the  one  given  in  the  Definition  7.1.9.  This  is  the  content 
of  the  following  proposition,  whose  proof  we  omit. 
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Proposition  7.1.13  Let  V  be  a  real  vector  space  with  basis  B  =  (v  \,  ...  ,vn)  and 
W  a  real  vector  space  with  basis  C  =  (w  i,  . . . ,  wm).  The  following  results  hold. 

C  13 

(i)  Iff  :  V  W  is  a  linear  map,  by  setting  A  =  M  f  it  holds  that 

fC,B  _  r 

J  A  —  j  ' 

(ii)  If  A  e  by  setting  f  =  fA'B  it  holds  that 

Mcf'B  =  A. 

Proposition  7.1.14  Let  V  and  W  be  two  real  vector  spaces  with  (v\,  vn)  a  basis 
for  V.  For  any  choice  of{u\,  . . .  ,un}  ofn  elements  in  W  there  exists  a  unique  linear 
map  f  :  V  — >  W  such  that  f(vj)  =  uj  for  any  j  =  1,  ...  ,n. 

Proof  To  define  such  a  map  one  uses  that  any  vector  v  e  V  can  be  written  uniquely 
as 

V  =  a\  V\  T  •  •  •  T  &nVn 


with  respect  to  the  basis  (v\,  . . . ,  vn).  By  setting 


/  (n)  —  a\f{v\)  +  •  •  •  +  anf{vn)  —  a\U\  +  •  •  •  +  anun 


we  have  a  linear  (by  construction)  map  /  that  satisfies  the  required  condition 
f(vj)  =  uj  for  any  j  e  1, ...  ,n. 

Let  us  now  suppose  this  map  is  not  unique  and  that  there  exists  a  second  linear 
map  g  :  V  — >  W  with  g(vj)  =  uj.  From  the  Proposition  7.1.5  we  could  then  write 

g(v)  =  a\g(v\)  H - b  ang(vn)  =  a\u\  H - h  anun  =  f(v). 


thus  getting  g  =  /.  □ 

What  we  have  discussed  so  far  gives  two  equivalent  ways  to  define  a  linear  map 
between  two  vector  spaces  V  and  W. 

I.  Once  a  basis  B  for  V,  a  basis  C  for  W  and  a  matrix  A  =  (an)  e  Wn,n  are  fixed, 

C  t3 

from  the  Proposition  7.1.13  we  know  that  the  linear  map  f A  ’  is  uniquely 
determined. 

II.  Once  a  basis  B  =  (v\ , . . . ,  vn)  for  V  and  n  vectors  [u\ , . . . ,  un)  in  W  are  fixed, 
we  know  from  the  Proposition  7.1.14  that  there  exists  a  unique  linear  map 
/  :  V  — >  W  with  f(vj)  =  uj  for  any  j  =  1,  . . . ,  n. 

From  now  on,  if  V  =  W1  and  B  =  S  is  its  canonical  basis  we  shall  denote  by 
f((x  i,  ...  ,xn))  what  we  have  previously  denoted  as  f((x\,  Analogously, 

with  C  =  S  the  canonical  basis  for  W  =  Mm  we  shall  write  (yi,  . . . ,  ym)  instead  of 
(yi  5  •  •  •  5  ym)C’ 
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With  such  a  notation,  if  /  :  Wl  ->  is  the  linear  map  which,  with  respect  to 
the  canonical  basis  for  both  vector  spaces  corresponds  to  the  matrix  A,  its  action  is 
written  as 

f((x  1, . .  .,*„))  =  A‘(x  1,  ...,xn) 


or  equivalently 


/ ((*1,  •  •  •  ,  xn))  —  (anxi  +  •  •  •  +  CL\nXn ,  •  •  •  ,  +  •  •  •  +  dmnXn). 


Exercise  7.1.15  Let  /o  :  V  — >  W  be  the  wm//  (zero)  ma]?,  that  is  fo(v)  =  for 
any  v  e  V .  With  B  and  C  arbitrary  bases  for  V  and  W  respectively,  it  is  clearly 


M 


C,B 

fo 


that  is  the  null  matrix. 

Exercise  7.1.16  If  id y(v)  =  v  is  the  identity  map  on  V  then,  using  any  basis 
B  =  (v  i,  ...  ,vn)  for  V,  one  has  the  following  expression 

id v(vj)  =  Vj  =  (0,  . . . ,  0,  ^,0,  . . . ,  0 )B 

j 

for  any  j  =  1,  . . . ,  n.  That  is  is  the  identity  matrix  In.  Notice  that  ^  In 
if  B^C. 

Exercise  7.1.17  Let  us  consider  for  M3  both  the  canonical  basis  S3  =  (ei,  e 2,  £3) 
and  the  basis  B  =  (v  1,  V2,  V3 )  with 


M  =  (0,1,1),  t,2  =  (1,0,1),  1,3  =  (1,1,0). 


A  direct  computation  gives 


=  (  1  0  1  j  , 


Mid3’8  =  - 


and  each  of  these  matrices  turns  out  to  be  the  inverse  of  the  other,  that  is 


MiAB  Mid’£3 
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7.2  Basic  Notions  on  Maps 

Before  we  proceed  we  recall  in  a  compact  and  direct  way  some  of  the  basic  notions 
concerning  injectivity,  surjectivity  and  bijectivity  of  mappings  between  sets. 

Definition  7.2.1  Let  X  and  Y  be  two  non  empty  sets  and  /  :  X  ->  Y  a  map 
between  them.  The  element  f(x)  in  Y  is  called  the  image  under  /  of  the  element 
x  e  X.  The  set 

Im(/)  =  {y  eY  \  3  x  e  X  :  y  =  f(x)} 

is  called  the  image  (or  range )  of  /  in  Y .  The  set  (that  might  be  empty) 

f~\y )  =  [x  e  X  :  f{x)  =  y}. 

defines  the  pre-image  of  the  element  y  e  Y. 

Definition  7.2.2  Let  X  and  Y  be  two  non  empty  sets,  with  a  map  /  :  X  ->  Y.  One 
says  that: 

(i)  /  is  injective  if,  for  any  pair  Vi,  v2  €  X  with  x\  ^  X2,  it  is  f(x i)  ^  /fe), 

(ii)  /  is  surjective  if  Im(/)  =  Y, 

(iii)  /  is  bijective  if  /  is  both  injective  and  surjective. 

Definition  7.2.3  Let  /  :  X  — >  Y  and  g  :  Y  — >  Z  be  two  maps.  The  composition 
of  g  with  /  is  the  map 

gof  :  X  ->  Z 

defined  as  (g  o  f)(x)  =  g(f(x))  for  any  x  e  X. 

Definition  7.2.4  A  map  /  :  X  — >  Y  is  invertible  if  there  exists  a  map  g  :  Y  X 
such  that  g  o  /  =  idx  and  f  o  g  =  idY.  In  such  a  case  the  map  g  is  called  the 
inverse  of  /  and  denoted  by  f~l.  It  is  possible  to  prove  that,  if  /  is  invertible,  then 
f~l  is  unique. 

Proposition  7.2.5  A  map  /  :  X  —>  Y  is  invertible  if  and  only  if  it  is  bijective.  In 
such  a  case  the  map  f~l  is  invertible  as  well,  with  (/_1)_1  =  /. 


7.3  Kernel  and  Image  of  a  Linear  Map 

Injectivity  and  surjectivity  of  a  linear  map  are  measured  by  two  vector  subspaces 
that  we  now  introduce  and  study. 

Definition  7.3.1  Consider  a  linear  map  /  :  V  — >  W .  The  set 


V  ^  ker (/)  =  (i)Gf  :  f(v)  =  0W} 


7.3  Kernel  and  Image  of  a  Linear  Map 
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is  called  the  kernel  of  /,  while  the  set 

W  L>  Im(/)  =  {w  e  W  :  3  v  £  V  :  w  =  f(v)} 
is  called  the  image  of  /. 

Theorem  7.3.2  Given  a  linear  map  f  :  V  —>  W,  the  set  ker (/)  is  a  vector  sub¬ 
space  in  V  and  Im  (/)  is  a  vector  sub  space  in  W. 

Proof  We  recall  the  Proposition  2.2.2.  Given  u,  v '  £  ker(/)andA,  A'  £  R  we  need  to 
compute  /( Xv  +  X'v').  Since  f(v)  =  (V  =  f(v')  by  hypothesis,  from  the  Proposi¬ 
tion  7.1.5  we  have  /(An  +  AV)  =  A  f(v)  +  A '  f(v')  =  (V.  This  shows  that  ker(/) 
is  a  vector  subspace  in  V. 

Analogously,  let  w,  w'  £  Im (/)  and  A,  A'  e  R.  From  the  hypothesis  there  exist 
v,  v'  £  V  such  that  w  =  f(v)  and  w'  =  /(i/);  thus  we  can  write  A w  +  X'w'  = 
X f(v )  +  A 'f(v')  =  /(An  +  A'n')  £  Im (/)  again  from  he  Proposition  7.1.5.  This 
shows  that  Im(/)  is  a  vector  subspace  in  W.  □ 

Having  proved  that  Im (/)  and  ker (/)  are  vector  subspaces  we  look  for  a  system 
of  generators  for  them.  Such  a  task  is  easier  for  the  image  of  /  as  the  following 
lemma  shows. 

Lemma  7.3.3  With  f  :  V  —>  W  a  linear  map,  one  has  that 
Im (/)  =  £(f(v i),  . . . ,  where  B  =  (ui,  . . . ,  vn)  is  an  arbitrary  basis  for 

V.  The  map  f  is  indeed  surjective  if  and  only  if  f(v\ ),...,  f(vn)  generate  W. 

Proof  Let  w  £  Im(/),  that  is  w  =  /(n)  for  some  n  £  V.  Being  B  a  basis  for  V,  one  has 
v  =  a\V\  +  •  •  •  +  anvn  and  since  /  is  linear,  one  has  w  =  a\f(v\)  +  •  •  •  +  anf(vn ), 
thus  giving  w  £  C(f(v i),  . . . ,  We  have  then  Im(/)  c  £(/(m),  . . . ,  /(n„)). 

The  opposite  inclusion  is  obvious  since  Im(/)  is  a  vector  subspace  in  W  and  contains 
the  vectors  . . . ,  f{vn). 

The  last  statement  is  the  fact  that  f  is  surjective  (Definition  7.2.2)  if  and  only  if 
Im  (/)  =  W.  □ 

Exercise  7.3.4  Let  us  consider  the  linear  map  /  :  M3  — >  M2  given  by 

/((*,  y,  z))  =  (x  +  y  -  z,  x  -  y  +  z). 

From  the  lemma  above,  the  vector  subspace  Im (/)  is  generated  by  the  images 
under  /  of  an  arbitrary  basis  in  M3.  With  the  canonical  basis  £  =  (ei,^,^)  we 
have  Im (/)  =  £(/(<?i),  f(e2),  f(e3)),  with 

f(ei)  =  (1,  1),  f(e2)  =  (1,  -1),  f(e3)  =  (-1,  1). 

It  is  immediate  to  see  that  Im(/)  =  M2,  that  is  /  is  surjective. 
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Lemma  7.3.5  Let  f  :  V  —>  W  be  a  linear  map  between  two  real  vector  spaces. 

Then, 

(i)  f  is  injective  if  and  only  ifker(f)  =  {0y}, 

(ii)  if  f  is  injective  and  (v\,  ... ,  vn)  is  a  basis  for  V,  the  vectors  f(v  i),  . . . ,  /( vn) 
are  linearly  independent. 

Proof  (i)  Let  us  assume  that  /  is  injective  and  v  e  ker (/),  that  is  f(v)  =  0^. 
From  the  Proposition  7.1.5  we  know  that  /(0y)  =  0W.  Since  /  is  injective  it 
must  be  v  =  0y,  that  is  ker (/)  =  {0y}. 

Viceversa,  let  us  assume  that  ker  (/)  =  0y  and  let  us  consider  two  vectors  v\,  v 2 
such  that  f(y  1)  =  f(v 2).  Since  /  is  linear  this  reads  O^y  =  f(v  1)  —  f(v 2)  = 
f(v  1  —  v2),  that  is  v\  —  V2  6  ker (/)  which,  being  the  latter  the  null  vector 
subspace,  thus  gives  v\  =  V2- 

(ii)  In  order  to  study  the  linear  independence  of  the  system  of  vectors 
{f(v  1), . . . ,  f(yn)}  let  us  take  scalars  Ai, . . . ,  Xn  €  R  such  that 

Ai/(vi)  H - h  A nf(vn)  =  0W.  Being  /  linear,  this  gives  /( XiVi  H - h 

Xnvn)  =  Ojy  and  then  \iv\  +  •  •  •  +  \nvn  e  ker (/).  Since  /  is  injective,  from 
(/)  we  have  ker  (/)  =  {0y }  so  it  is  Ai^i  +  •  •  •  +  Xnvn  =  0y.  Being  (v\, . . . ,  vn) 
a  basis  for  V,  we  have  that  Ai  =  •  •  •  =  Xn  =  Or  thus  proving  that  also 
f(y  1),  . . . ,  f(vn)  are  linearly  independent. 

□ 


Exercise  7.3.6  Let  us  consider  the  linear  map  /  :  M2  — >  M3  given  by 

f((x,  y))  =  (x  +  y,  x  -  y,  2x  +  3 y). 

The  kernel  of  /  is  given  by 

ker (/)  =  {(*,  y)  e  M2  |  f((x,  y))  =  (x  +  y,  x  -  y,  2x  +  3 y)  =  (0,  0,  0)} 
so  we  have  to  solve  the  linear  system 


Iv  +  y  =  0 
x-y= 0  . 

2x  +  3y  =  0 

Its  unique  solution  is  (0,  0)  so  ker (/)  =  {0^2}  and  we  can  conclude,  from  the 
lemma  above,  that  /  is  injective.  From  the  same  lemma  we  also  know  that  the 
images  under  /  of  a  basis  for  M2  make  a  linearly  independent  set  of  vectors.  If  we 
take  the  canonical  basis  for  M2  with  e\  =  (1,0)  and  £2  =  (0,  1),  we  have 


/(*!)  =  (1,1,  2),  f(e2)  =  (  1,-1,  3). 


7.4  Isomorphisms 
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7.4  Isomorphisms 

Definition  7.4.1  Let  V  and  W  be  two  real  vector  spaces.  A  bijective  linear  map 
/  :  V  —>  W  is  called  an  isomorphism.  Two  vector  spaces  are  said  to  be  isomorphic 
if  there  exists  an  isomorphism  between  them.  If  /  :  V  W  is  an  isomorphism 
we  write  V  =  W. 

Proposition  7.4.2  If  the  map  f  :  V  — >  W  is  an  isomorphism,  such  is  its  inverse 
/-'  :  W  -*  V. 

Proof  From  the  Proposition  7.2.5  we  have  that  /  is  invertible,  with  an  invertible 
inverse  map  f~l.  We  need  to  prove  that  f~l  is  linear.  Let  us  consider  two  arbitrary 
vectors  w\,  W2  £  W  with  v\  =  /_1(uq)  and  i>2  =  f~l(w> 2)  in  V;  this  is  equivalent 
to  w  1  =  /(iq)  and  1U2  =  fivi)-  Let  us  consider  also  Ai,  A2  e  R.  Since  /  is  linear 
we  can  write 

Ai  W\  +  \2w2  =  f  (A1U1  +  A2U2). 


For  the  action  of  /  1  is  then 

/_1(Aiu>i  +  A  2w2)  =  A 1 1?  1  +  X2v2  =  Ai/_1(u>i)  +  X2f~l(w2), 

which  amounts  to  say  that  /-1  is  a  linear  map.  □ 

In  order  to  characterise  isomorphisms  we  first  prove  a  preliminary  result. 

Lemma  7.4.3  Let  f  :  V  —>  W  be  a  linear  map  with  (iq , ,vn)  a  basis  for  V. 
The  map  f  is  an  isomorphism  if  and  only  if(f(v  1),  . . . ,  f(vn))  is  a  basis  for  W. 

Proof  If  /  is  an  isomorphism,  it  is  both  injective  and  surjective.  From  the 
Lemma  7.3.3  the  system  /(iq), . . . ,  f(vn)  generates  W,  while  from  the  Lemma  7.3.5 
such  a  system  is  linearly  independent.  This  means  that  (/( iq),  ...,  /(u„))isa  basis 
for  W. 

Let  us  now  assume  that  the  vectors  (f(v\),...,f(vn))  are  a  basis  for  W.  From  the 
Proposition  7.1.14  there  exists  a  linear  map  g  :  W  — >  V  such  that  g(f(vj))  =  Vj 
for  any  j  =  1 ,  . . . ,  n.  This  means  that  the  linear  maps  g  o  /  and  idv  coincide  on  the 
basis  (v\,...,vn)mV  and  then  (again  from  Proposition  7.1.14)  they  coincide,  that 
is  g  o  /  =  id y.  Along  the  same  lines  it  is  easy  to  show  that  /  o  g  =  id w,  so  we  have 
g  =  /-1;  the  map  /  is  then  invertible  so  it  is  an  isomorphism.  □ 

Theorem  7.4.4  Let  V  and  W  be  two  real  vector  spaces.  They  are  isomorphic  if  and 
only  if  dim(V)  =  dim(W). 

Proof  Let  us  assume  V  and  W  to  be  isomorphic,  that  is  there  exists  an  isomor¬ 
phism  /  :  V  — >  W.  From  the  previous  lemma,  if  (iq ,  . . . ,  vn)  is  a  basis  for  V,  then 
(f(v  1),  . . . ,  f(vn))  is  a  basis  for  W  and  this  gives  dim(V)  =  n  =  dim(W). 

Let  us  now  assume  n  =  dim(V)  =  dim(W)  and  try  to  define  an  isomorphism 
/  :  V  — >  W.  By  fixing  abasis  B  =  (iq,  ...,  vn)  for  V  and  abasisC  =  (uq,  ...,  wn) 
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for  W,  we  define  the  linear  map  f{vj)  =  Wj  for  any  j .  Such  a  linear  map  exists  and  it 
is  unique  from  the  Proposition  7.1.14.  From  the  lemma  above,  /  is  an  isomorphism 
since  it  maps  the  basis  B  to  the  basis  C  for  W .  □ 

Corollary  7.4.5  If  V  is  a  real  vector  space  with  dim(V)  =  n,  then  V  =  M71.  Any 
choice  of  a  basis  B  for  V  induces  the  natural  isomorphism 

a  :  V  R”  given  by  (x\, ,  xn)s  i->  (x\,  ... ,  xn). 

Proof  The  first  claim  follows  directly  from  the  Theorem  7.4.4  above.  Once  the 
basis  B  =  (iq,  . . . ,  vn)  is  chosen  the  map  a  is  defined  as  the  linear  map  such  that 
a(Vj)  =  ej  for  any  j  =  1,  . . . ,  n.  From  the  Lemma  7.4.3  such  a  map  a  is  an  iso¬ 
morphism.  It  is  indeed  immediate  to  check  that  the  action  of  a  on  any  vector  in  V  is 
given  by  a  :  (x\,  . . . ,  xu)b  i->  (xi,...,xn).  □ 

Exercise  7.4.6  Let  V  =  R[X]2  be  the  space  of  the  polynomials  whose  degree  is 
not  higher  than  2.  As  we  know,  V  has  dimension  3  and  a  basis  for  it  is  given  by 

B  =  (1,  X,  X2).  The  isomorphism  a  :  R[X]2  — R3  corresponding  to  such  a  basis 
reads 

a  H-  bX  H-  cX2  i — ^  ( a ,  b ,  c). 

It  is  simple  to  check  whether  a  given  system  of  polynomials  is  a  basis  for  M[X]2. 
As  an  example  we  consider 


Pi(X)  =  3X  —  X2,  p2(X)  =  l+X,  p3(X)  =  2  +  3X2. 

By  setting  v\  =  a(p{)  =  (0,  3,  -1),  v2  =  a(p2)  =  (1,  1,0)  and  v3  =  a(p3)  = 
(2,  0,  3),  it  is  clear  that  the  rank  of  the  matrix  whose  columns  are  the  vectors 
v\,  V2,  V3  is  3,  thus  proving  that  (iq,  V2,  V3)  is  a  basis  for  M3.  Since  a  is  an  iso¬ 
morphism,  the  inverse  a~l  :  M3  — >  R[X]2  is  an  isomorphism  as  well:  the  vectors 
a~l(v2),  a~l(v2)  provide  a  basis  for  M[X]2  and  coincide  with  the  given 
polynomials  pi(X),  p2(X),  773 (X). 

Theorem  7.4.4  shows  that  a  linear  isomorphism  exists  only  if  its  domain  has  the 
same  dimension  of  its  image.  A  condition  that  characterises  isomorphism  can  then 
be  introduced  only  for  vector  spaces  with  the  same  dimensions.  This  is  done  in  the 
following  sections. 


7.5  Computing  the  Kernel  of  a  Linear  Map 

We  have  seen  that  isomorphisms  can  be  defined  only  between  spaces  with  the  same 
dimension.  Being  not  an  isomorphism  indeed  means  for  a  linear  map  to  fail  to  be 
injective  or  surjective.  In  this  section  and  the  following  one  we  characterise  injectivity 
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and  surjectivity  of  a  linear  map  via  the  study  of  its  kernel  and  its  image.  In  particular, 
we  shall  describe  procedures  to  exhibit  bases  for  such  spaces. 

Proposition  7.5.1  Let  f  :  V  —>  W  be  a  linear  map  between  real  vector  spaces , 
and  dim(V)  =  n.  Fix  a  basis  B  for  V  and  a  basis  C  for  W,  with  associated  matrix 
A  =  Mf  .  By  denoting  X  :  AX  =  0  the  linear  system  associated  to  A,  the  following 
hold: 

(i)  Sy,  =  ker (/)  via  the  isomorphism  (x\,  ... ,  xn)  i->  (x\,  . . . ,  xn)&, 

(ii)  dim  (ker  (/))  =  n  —  rk(A), 

(in)  if(v  i,  . . . ,  vp)  is  a  basis  for  S £,  the  vectors  ((iq)#,  •  •  • ,  (vp)b)  are  a  basis  for 
ker  (/). 

Proof  (i)  With  the  given  hypothesis,  from  the  definition  of  the  kernel  of  a  linear 
map  we  can  write 


ker(/)  =  {v  e  V  :  f(v)  =  0W} 


/ 

\ 

| 

(xi, . .  .,x„)B  e  V  : 

A 

• 

— 

• 

\X„J 

Jc  ' 

VO  ) 

J 

—  {(-^1 5  •  •  •  ?  %n)l3  C  V  .  (xi ,  ,  Xn)  G  } 


with  Sy  denoting  the  space  of  solutions  for  E.  As  in  Corollary  7.4.5  we  can 
then  write  down  the  isomorphism  S £  ker (/)  given  by 

(X\,...,Xn)  I  ^  (Xi,  .  .  .  ,  Xn)]3. 


(ii)  From  the  isomorphism  of  the  previous  point  we  then  have 

dim(ker(/))  =  dim(5's)  =  n  —  rk(A) 

where  the  last  equality  follows  from  the  Theorem  6.4.3. 

(iii)  From  the  Lemma  7.4.3  we  know  that,  under  the  isomorphism  — >►  ker (/), 

a  basis  for  S s  is  mapped  into  a  basis  for  ker(/). 

□ 

Exercise  7.5.2  Consider  the  linear  map  /  :  M3  — >  M3  defined  by 

/((v,  y,  z)b)  =  (x  +  y  -  z,x  -  y  +  z,  2x)s 

where  B  =  ((1,  1,  0),  (0,  1,  1),  (1,  0,  1))  and  8  is  the  canonical  basis  for  M3.  We 
determine  ker (/)  and  compute  a  basis  for  it  with  respect  to  both  B  and  £.  Start  by 
considering  the  matrix  associated  to  the  linear  map  /  with  the  given  basis, 
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A  =  M£f'B  =  (  1  -1  1  )  . 

\2  0  0  / 

To  solve  the  linear  system  X  :  AX  =  0  we  reduce  the  matrix  A  by  rows: 

/“-A  /11-1\ 

A  i->  200  200 

\20  0  /  \00  0  / 

and  the  space  of  the  solutions  of  X  is  then  given  by 

S-£  =  {(0,  a,  a)  :  a  e  M}  =  £((0,  1,  1)). 


This  reads 

ker(/)  =  {(0,  a,  a)s  :  a  e  R}, 

with  a  basis  given  by  the  vector  (0,  1 ,  1  )& .  With  the  explicit  expression  of  the  elements 
of  B, 

(0,1,1b  =  (0,  1,1) +  (1,0,  1)  =  (1,1,2). 


This  shows  that  the  basis  vector  for  ker (/)  given  by  (0,  1 ,  1)  on  the  basis  B  is  the 
same  as  the  basis  vector  (1,  1,2)  with  respect  to  the  canonical  basis  £  for  M3. 

Exercise  7.5.3  With  canonical  bases  £,  consider  the  linear  map  /  :  R3  -*  M3 
given  by 

/((■*,  y,  z))  =  (x  +  y  -  z,  x  -  y  +  z,  2x). 

To  determine  the  space  ker (/),  we  observe  that  the  matrix  associated  to  /  is 
the  same  matrix  of  the  previous  exercise,  so  the  linear  system  X  :  AX  =  0  has 
solutions  Sz  =  C(( 0,  1,1))=  ker (/),  since  £  is  the  canonical  basis. 

Since  the  kernel  of  a  linear  map  is  the  preimage  of  the  null  vector  in  the  image 
space,  we  can  generalise  the  above  procedure  to  compute  the  preimage  of  any  element 
w  e  W.  We  denote  it  as  f~x{w)  =  {v  e  V  :  f(y)  =  w},  with  ker (/)  =  f~l(0w). 
Notice  that  we  denote  the  preimage  of  a  set  under  /  by  writing  /_1  also  when  /  is 
not  invertible. 

Proposition  7.5.4  Consider  a  real  vector  space  V  with  basis  B  and  a  real  vec- 

C  t3 

tor  space  W  with  basis  C.  Let  f  :  V  — >  W  be  a  linear  map  with  A  =  My  its 
corresponding  matrix.  Given  any  w  =  (y\, . . . ,  ym)c  £  W,  it  is 

=  {(xi, . .  .,x„)b  e  V  :  A‘(x  u  ...,xn)  =  ‘(y  i, . . . ,  ym)}. 


Proof  It  is  indeed  true  that,  with  v  =  (x\ ,  ,xn)&,  one  has 
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f(v)  =  . . .  ,xn)B)  =  (A'(xi,  ...,x„))c. 

The  equality  f(v)  =  w  is  the  equality  of  components,  given  by 
AT(x i, . . . ,  xn)  =T(yu  ,  ym),  on  the  basis  C.  □ 

Remark  7.5.5  This  fact  can  be  expressed  via  linear  systems.  Given  w  e  W,  its  preim- 
age  is  made  of  vectors  in  V  whose  components  with  respect  to  B  solve  the 

linear  system  AX  =  B,  where  B  is  the  column  of  the  components  of  w  with  respect 
to  C. 

Exercise  7.5.6  Consider  the  linear  map  /  :  M3  — >  M3  given  in  the  Exercise  7.5.2. 
We  compute f~l(w)  for  w  =  (1,  1,  1).  We  have  then  to  solve  the  system  X  :  AX  =  B , 
with  #='(1,1,1).  We  reduce  the  matrix  (A,  B)  as  follows 

/I  1  -1  1\  /I  1  -1  1\  /I  1  -1  1\ 

(A,  B)  =  [1—1  1  1  ]  i — >  12  0  0  2  1  i — >  (  1  0  0  1  I  • 

\2  0  0  1/  \20  0  1/  \0  0  0  1/ 

This  shows  that  the  system  X  has  no  solution,  that  is  w  £  Im (/). 

Next,  let  us  compute  f~l(u)  for  u  =  (2,  0,  2),  so  we  have  the  linear  system 
X  :  AX  =  B  with  B  =  l  (2,0,2).  Reducing  by  row,  we  have 

/I  1  -1  2\  /I  1  -1  2\  /I  1  -1  2\ 

(A,  B)  =  1  -1  1  oL  20  0  2  U  20  0  2  . 

\2  0  0  2)  \20  0  2/  \00  0  0/ 

The  system  X  is  then  equivalent  to 

r 

x  =  1 

y  =  z  +  l 


whose  space  of  solutions  is  =  {(l,  a  -\-  l,  a)  :  acM}.  We  can  then  write 
f  (2,  0,  2)  =  {(1,  a  1,  o7)'s  •  ^  c  M.}  =  {($  H-  1,  2,  2$  H-  1)  ’.  a  (E  IR.}. 

7.6  Computing  the  Image  of  a  Linear  Map 

We  next  turn  to  the  study  of  the  image  of  a  linear  map. 

Proposition  7.6.1  Let  f  :  V  ^  W  be  a  linear  map  between  real  vector  spaces , 

with  dim(V)  =  n  and  dim(W)  =  m.  Fix  a  basis  B  for  V  and  a  basis  C  for  W,  with 

c  3 

associated  matrix  A  =  M  f  and  with  C(A)  its  space  of  columns.  The  following 
results  hold: 

(i)  Im (/)  =  C(A)  via  the  isomorphism  iy\,  ,  ym)c  ^  (yi,  . . . ,  ym), 
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(ii)  dim(Im(/))  =  rk(A), 

(iii)  if  (w i,  . . . ,  wr)  is  a  basis  for  C(A),  then  ((w\)c,  . . . ,  (wr)c)  is  a  basis  for 

Im  (/)• 

Proof  (i)  With  the  given  hypothesis,  from  the  definition  of  the  image  of  a  linear 
map  we  can  write 


Im(/)  =  {w  eW  :  3v  e  V  :  w  =  f(v)} 


(y\  \ 

/ 

(x\ \ 

\ 

W  =  (y  1, . . . ,  ym)c  eW  :  3(xi, . .  .,xn)B  e  V  : 

• 

— 

A 

• 

\ym) 

c 

V 

V-X/7  / 

/ 

c . 

=  { (yi ,  •  •  • ,  ym)c  e  W  :  3  Oi , . . . ,  xn)  e  R 


n 


=  A 

/x\\ 

\ymJ 

\xn) 

> 

Representing  the  matrix  A  by  its  columns,  that  is  A  =  (C\  •  •  •  Cn),  we  have 


A 


—  x\C\  +  •••  +  xnCn. 


We  can  therefore  write 


Im (/)  =  \(y i, . . . ,  ym)c  e  W  :  3  (xi, . . . ,  xn)  e  M  : 


/yi\ 


Vjm  / 


=  ViCl  +  •  •  -Xn  C 


n 


=  { (yi,  •  •  • ,  ym)c  e  IT  : 


/yi\ 


\ym ) 


G  C(A)  ^  . 


We  have  then  the  isomorphism  C(A)  — >►  Im(/)  defined  by 

(Jt,  •  •  •  ,  Tm)  l->  (Tl,  •  •  •  ,  Tm)c 

(compare  this  with  the  one  in  the  Corollary  7.4.5). 

(ii)  Being  Im(/)  =  C(A),  it  is  dim (Im(/))  =  dim(C(A))  =  rk(A). 

(iii)  The  claim  follows  from  (i)  and  the  Lemma  7.4.3. 

□ 

Remark  7.6.2  To  determine  a  basis  for  C(A)  as  in  (iii)  above,  one  can  proceed  as 
follows. 

(a)  If  the  rank  of  A  is  known,  one  has  to  select  n  linearly  independent  columns: 
they  will  give  a  basis  for  C (A). 
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(b)  If  the  rank  of  A  is  not  known,  by  denoting  A'  the  matrix  obtained  from  A  by 
reduction  by  columns,  a  basis  for  C  (A)  is  given  by  the  r  non  zero  columns  of  A' . 

Exercise  7.6.3  Let  /  :  M3  — >  M3  be  the  linear  map  with  associated  matrix 

1  -1  2  \ 

0  1-3 
2-11/ 

for  the  canonical  basis  £  and  basis  C  =  (uq,  W2, wf),  with  w  1  =  (1,1,0), 
W2  =  (0,  1,  1),  W3  =  (1,  0,  1).  We  reduce  A  by  columns 

c2  ^  c2  +  Ci  /l  0  0  \  c3  ^  c3  +  3Ci  / 1  0  0\ 

A  - >  01-3  - >  010  =  A 

c3^c3-2Ci  \2  1  —  3/  \2  1  0/ 

Being  A;  reduced  by  columns,  its  non  zero  columns  yield  a  basis  for  the  space 
C(A).  Thus,  C(A)  =  C(A')  =  £((1,  0,  2),  (0,  1,  1)).  From  the  Proposition  7.6.1 
a  basis  for  Im (/)  is  given  by  the  pair  (^i ,  U2), 


u\  =  (1,  0,  2)c  =  u’i  +  2tu3  =  (3,  1,  2) 

=  (0,  1,  1  )c  =  W2  +  W3  =  (1,  1,  2). 


Clearly,  dim(Im (/))  =  2  =  rk(A). 

From  the  previous  results  we  have  the  following  theorem. 

Theorem  7.6.4  Let  f  :  V  ^  W  be  a  linear  map.  It  holds  that 

dim(ker(/))  +  dim(Im  (/))  =  dim(V). 

Proof  Let  A  be  any  matrix  associated  to  /  (that  is  irrespective  of  the  bases  chosen 
in  V  and  W).  From  the  Proposition  7.5.1  one  has  dim(ker (/))  =  dim(V)  —  rk(A), 
while  from  the  Proposition  7.6.1  one  has  dim(Im(/))  =  rk(A).  The  claim  follows. 

□ 


From  this  theorem,  the  next  corollary  follows  easily. 

Corollary  7.6.5  Let  f  :  V  — >  W  be  a  linear  map,  with  dim(V)  =  dim(W).  The 
following  statements  are  equivalent. 

(i)  f  is  injective, 

(ii)  f  is  surjective, 

(iii)  f  is  an  isomorphism. 

Proof  Clearly  it  is  sufficient  to  prove  the  equivalence  (i)  (ii).  From  the  Lemma 

7.3.5  we  know  that  /  is  injective  if  and  only  if  dim(ker(/))  =  0.  We  also  known 
that  /  is  surjective  if  and  only  if  dim(Im(/))  =  dim(W).  Since  dim(V)  =  dim(W) 
by  hypothesis,  the  statement  thus  follows  from  the  Theorem  7.6.4.  □ 
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7.7  Injectivity  and  Surjectivity  Criteria 


In  this  section  we  study  conditions  for  injectivity  and  surjectivity  of  a  linear  map 
through  properties  of  its  associated  matrix. 

Proposition  7.7.1  (Injectivity  criterion)  Let  f  :  V  — >  W  be  a  linear  map.  Then  f 
is  injective  if  and  only  ifrk(A)  =  dim(V)  for  any  matrix  A  associated  to  f  ( that  is, 
irrespective  of  the  bases  with  respect  to  which  the  matrix  A  is  given). 

Proof  From  (i)  in  the  Lemma  7.3.5  we  know  that  /  is  injective  if  and  only  if 
ker (/)  =  {Oy },  which  means  dim(ker(/))  =  0.  From  the  Proposition  7.5.1  we  have 
that  dim  (ker  (/))  =  dim(F)  —  rk(A)  for  any  matrix  A  associated  to  /.  We  then  have 
that  /  is  injective  if  and  only  if  dim(F)  —  rk(A)  =0.  □ 

Exercise  7.7.2  Let  /  :  M[X]2  — >  M2,2  be  the  linear  map  associated  to  the  matrix 

/  2  i  0\ 

,  -1  0  1 
A  _  2  11 
V  1  0  0/ 

with  respect  to  two  given  basis.  Since  A  is  already  reduced  by  column,  rk(A)  =  3, 
the  number  of  its  non  zero  columns.  Being  dim(M[X]2)  =  3  we  have,  from  the 
Proposition  7.7.1,  that  /  is  injective. 

Proposition  7.7.3  (Surjectivity  criterion)  Let  f  :  V  — >  W  be  a  linear  map.  The 
map  f  is  surjective  if  and  only  if  rk(  A)  =  dim  (W)  for  any  matrix  associated  to  f 
( again  irrespective  of  the  bases  with  respect  to  which  the  matrix  A  is  given). 

Proof  This  follows  directly  from  the  Proposition  7.6.1.  □ 

Exercise  7.7.4  Let  /  :  M3  —>  M2  be  the  linear  map  given  by 


f(x,  y,  z)  =  (x  +  y  -  z,  2x  -  y  +  2z). 

With  S  the  canonical  basis  in  M3  and  C  the  canonical  basis  in  M2,  we  have 


C  E  /  1  1  —  1 

A  =  My  = 


f 


2-12 


by  reducing  by  rows, 


A 


We  know  that  rk(A)  =  rk(Ar)  =  2,  the  number  of  non  zero  rows  in  A'.  Being 
dim(M2)  =  2,  the  map  /  is  surjective  from  the  Proposition  7.7.3. 
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We  have  seen  in  the  Proposition  7.4.2  that  if  a  linear  map  /  is  an  isomorphism, 
then  its  domain  and  image  have  the  same  dimension.  Injectivity  and  surjectivity  of 
a  linear  map  provide  necessary  conditions  on  the  relative  dimensions  of  the  domain 
and  the  image  of  the  map. 

Remark  7.7.5  Let  /  :  V  — >►  W  be  a  linear  map.  One  has: 

(a)  If  /  is  injective,  then  dim(V)  <  dim(W).  This  claim  easily  follows  from 
the  Lemma  7.3.5,  since  the  images  under  /  of  a  basis  for  V  gives  linearly 
independent  vectors  in  W. 

(b)  If  /  is  surjective,  then  dim(  V )  >  dim(  W).  This  claim  follows  from  the  Lemma 
7.3.3,  since  the  images  under  /  of  a  basis  for  V  generate  (that  is  they  linearly 
span)  W. 

Remark  7.7.6  Let  /  :  V  — >  W  be  a  linear  map,  with  A  its  corresponding  matrix 
with  respect  to  any  basis.  One  has: 

(a)  With  dim(V)  <  dim(W),  /  is  injective  if  and  only  if  rk(A)  is  maximal; 

(b)  With  dim(V)  >  dim(  W),  f  is  surjective  if  and  only  if  rk(A)  is  maximal; 

(c)  With  dim(  V)  =  dim(  W),  f  is  an  isomorphism  if  and  only  if  rk(A)  is  maximal. 

Exercise  7.7.7  The  following  linear  maps  are  represented  with  respect  to  canonical 
bases. 

(1)  Let  the  map  /  :  M3  — >  M4  be  defined  by 

(x,y,z)  i->  (x  -  y  +  2z,  y  +  z,  —x  +  z,  2x  +  y). 


To  compute  the  rank  of  the  corresponding  matrix  A  with  respect  to  the  canonical 
basis,  as  usual  we  reduce  it  by  rows.  We  have 


/  1  - 

-12^ 

/ 1 

_12\ 

A  = 

0 

1  1 

i-^ 

0 

i  i 

-1 

0  1 

0 

0  1 

\2 

1  0 ) 

\0 

0  0/ 

and  the  rank  of  A  is  maximal, 

rk(A) 

=  3. 

Since 

dim(V)  < 

is  injective. 

(2)  Let  the  map  /  :  M4 


M3  be  defined  by 


(x,y,z,t)  (x  —  y +  2z +  t,y  +  z  +  3t,x  —  y +  2z  +  2 1). 


We  proceed  as  above  and  compute,  via  the  following  reduction, 

/I  —1  2  1\  /I  —1  2  1\ 

A=[o  1  13  1  i — ^  lO  1  13  1, 

\1  —1  2  2/  \0  0  0  1/ 
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that  rk(A)  =  3.  Since  rk(A)  is  maximal,  with  dim(  V)  >  dim(W),  /  turns  out  to  be 
surjective. 

(3)  Let  /  :  M3  — >►  M3  be  represented  as  before  by  the  matrix 


which,  by  reduction,  becomes 


A  i-> 


whose  rank  is  clearly  maximal.  Thus  /  is  an  isomorphism  since  dim(V)  =  dim(  W). 


7.8  Composition  of  Linear  Maps 

We  rephrase  the  general  Definition  7.2.3  of  composing  maps. 

Definition  7.8.1  Let  /  :  V  — >  W  and  g  :  W  — >  Z  be  two  linear  maps  between 
real  vector  spaces.  The  composition  between  g  and  /  is  the  map 

gof  :  X  ->  Z 

defined  as  (g  o  f)(v)  =  g(f(v )),  for  any  v  e  X. 

Proposition  7.8.2  If  f  :  V  — >  W  and  g  :  W  — >  Z  Ovr?  linear  maps ,  the  com¬ 
position  map  g  o  f  :  V  ^  Z  is  linear  as  well 

Proof  For  any  v,  v'  e  V  and  A,  A;  e  R,  the  linearity  of  both  /  and  g  allows  one  to 
write: 


(g  °  /)( Au  +  A'u')  =  g(f(Xv  +  AV)) 

=  g(Xf(v)  + 

=  X  g(f(v))  +  X'gtfiv')) 

=  X (g  o  /)(«)  +  A'(g  o  f)(v'), 

showing  the  linearity  of  the  composition  map.  □ 

The  following  proposition,  whose  proof  we  omit,  characterises  the  matrix  corre¬ 
sponding  to  the  composition  of  two  linear  maps. 
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Proposition  7.8.3  Let  V,  W,  Z  be  real  vector  spaces  with  basis  B,C,V  respectively. 
Giv en  linear  map s  f  :  V  —>  Wandg  :  W  —>  Z,  the  corresponding  matrices  with 
respect  to  the  given  bases  are  related  by 

Mf0f  =  Mf'c  ■  Mcf'B. 

The  following  theorem  characterises  an  isomorphism  in  terms  of  its  corresponding 
matrix. 

Theorem  7.8.4  Let  f  :  V  — >  W  be  a  linear  map.  The  map  f  is  an  isomorphism 
if  and  only  if  for  any  choice  of  the  bases  B  for  V  and  C  for  W,  the  corresponding 
matrix  M  f  with  respect  to  the  given  bases  is  invertible,  with 


Proof  Let  us  assume  that  /is  an  isomorphism:  we  can  then  write  dim(V)  =  dim(W), 
so  M  f  is  a  square  matrix  whose  size  is  n  x  n  (say).  From  the  Proposition  7.4.2  we 

know  that  /_1  exists  as  a  linear  map  whose  corresponding  matrix,  with  the  given 
bases,  will  be  M  •  From  the  Proposition  7.8.3  we  can  write 

■  Mcf'B  =  MB:Bof  =  MBf  =  /„=>•  mb: f  =  ( Mcf’B )  1  . 

C  13 

We  set  now  A  =  M  f  .  By  hypothesis  A  is  a  square  invertible  matrix,  with 
inverse  A-1,  so  we  can  consider  the  linear  map 

g  =  /a-?  :W^V. 

In  order  to  show  that  g  is  the  inverse  of  /,  consider  the  matrix  corresponding  to 
go/  with  respect  to  the  basis  B.  From  the  Proposition  7.8.3, 

Mb;b  =  MB'C  ■  Mcf'B  =  A-1  •  A  =  /„. 

Since  linear  maps  are  in  bijection  with  matrices,  we  have  that  g  o  /  =  id y. 
Along  the  same  lines  we  can  show  that  /  o  g  =  id w,  thus  proving  g  =  f~l.  □ 

Exercise  7.8.5  Consider  the  linear  map  /  :  R3  — >  ]R3  defined  by 

f((x,  y,  z))  =  (x  -  y  +  z,  2 y  +  z,  z). 

With  the  canonical  basis  £  for  R3  the  corresponding  matrix  is 

(\  -1  1\ 

A  =  M£f’£  =021. 

\o  o  1/ 
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Since  rk(A)  =  3,  /  is  an  isomorphism,  with  /  1  the  linear  map  corresponding  to 
A-1.  From  the  Proposition  5.3.3,  we  have 


A 


-l 


1 

det(A) 


1  1/2  — 3/2\ 

0  1/2  -1/2  =  M£A. 

0  0  1/ 


7.9  Change  of  Basis  in  a  Vector  Space 

In  this  section  we  study  how  to  relate  the  components  of  the  same  vector  in  a  vector 
space  with  respect  to  different  bases.  This  problem  has  a  natural  counterpart  in 
physics,  where  different  bases  for  the  same  vector  space  represent  different  reference 
systems.  Thus  different  observers  measuring  observables  of  the  same  physical  system 
in  a  compatible  way. 

Example  7.9. 1  We  start  by  considering  the  vector  space  M2  with  two  bases  given  by 


£={ei=  (1,0),  e2  =  (0,  1)),  B=(bt  =  (1,  2),  b2  =  (3, 4)). 


Any  vector  v  e  M2  will  then  be  written  as 

v  =  (xux2)b  =  (yu  yi)e. 


or,  more  explicitly, 

v  =  X\b\  +x2b2  =  y\e\  +  y2e2. 


By  writing  the  components  of  the  elements  in  B  in  the  basis  £,  that  is 


b\  —  e\  +  2c2,  b2  —  3ci  +  4^2, 


we  have 


y  iei  +  y2^2  =  ^1(^1  +  2e2)  +  x2(3e\  +  4e2) 
=  (x\  +  3x2)e\  +  (2x\  +  4x2)e2. 


We  have  then  obtained 


yi  =  Xi+  3x2,  y2  =  2xi  +  4x2. 


These  expression  can  be  written  in  matrix  form 

LA  =  /  ^1  +  3^2  \  LA 

v  >-2 )  \  2.x  i  +4x2J  ^  \y2) 
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Such  a  relation  can  be  written  as 


where 


Notice  that  the  columns  of  A  above  are  given  by  the  components  of  the  vectors 
in  B  with  respect  to  the  basis  £.  We  have  the  following  general  result. 

Proposition  7.9.2  Let  V  be  a  real  vector  space  with  dim(V)  =  n.  Let  B  and  C  be 
two  bases  for  V  and  denote  by  (x\,  ... ,  xn)&  and  (y\,  ... ,  yn)c  the  component  of 
the  same  vector  v  with  respect  to  them.  It  is 


O' A 

• 

= 

• 

\>vj 

\Xn) 

Such  an  expression  will  also  be  written  as 

'(yu  ...,y„)  =  M^df  ■  ‘(xu  xn). 

Proof  This  is  clear,  by  recalling  the  Definition  7.1.12  and  the  Proposition  7.1.13.  □ 

Definition  7.9.3  The  matrix  MC,B  =  is  called  the  matrix  of  the  change  of 

basis  from  B  to  C.  The  columns  of  this  matrix  are  given  by  the  components  with 
respect  to  C  of  the  vectors  in  B. 

Exercise  7.9.4  Let  B  =  (v i,  V2,  v3)  and  C  =  (w i,  W2,  w3)  two  different  bases  for 
M3,  with 


vi  =  (0,  1,  -1),  v2  =  (1,  0,  -1),  v3  =  (2,  -2,  2), 

m  =  (0,  l,  l),  w2  =  (l,  o,  i),  w3  =  (i,  l,  0). 

We  consider  the  vector  v  =  (1,  — 1,  1  )&  and  we  wish  to  determine  its  components 
with  respect  to  C.  The  solution  to  the  linear  system 


V\  =  a\\W\  +  a2\W2  +  <231^3 
V2  =  a\2w\  +  a22W2  +  a32w3 
V3  =  #13^1  +  ^23^2  +  Cl33W3 


give  the  entries  for  the  matrix  of  the  change  of  basis,  which  is  found  to  be 
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0  -1  -1 


Mc'b  =|-10  3 

1  1  -1 


We  can  then  write 


v  = 


M 


C,B 


Theorem  7.9.5  Let  B  and  C  be  two  bases  for  the  vector  space  V  over  R.  The  matrix 
MC,B  is  invertible,  with 


Proof  This  easily  follows  by  applying  the  Theorem  7.8.4  to  =  Mid’  ,  since 

idy  =  idy1.  □ 

Theorem  7.9.6  Let  A  e  M77’77  be  an  invertible  matrix.  Denoting  by  v ,vn  the 
column  vectors  in  A  and  setting  B  =  (v\, ... ,  vn),  it  holds  that: 

(i)  B  is  a  basis  for  R", 

(ii)  A  =  Mb,s  with  S  the  canonical  basis  in  R”. 

Proof  (i)  From  the  Remark  7.7.6,  we  know  that  A  has  maximal  rank,  that  is 
rk(A)  =  n.  Being  the  column  vectors  in  A,  the  system  v\,  . . . ,  vn  is  then  free. 
A  system  of  n  linearly  independent  vectors  in  R”  is  indeed  a  basis  for  M77  (see 
the  Corollary  2.5.5  in  Chap.  2). 

(ii)  It  directly  follows  from  the  Definition  7.9.3. 

□ 


Remark  7.9.7  From  the  Theorems  7.9.5  and  7.9.6  we  have  that  the  group  GL(n,  R) 
of  invertible  matrices  of  order  n,  is  the  same  as  (the  group  of)  matrices  providing 
change  of  basis  in  M77 . 

Exercise  7.9.8  The  matrix 

A 


is  invertible  since  rk(A)  =  3  (the  matrix  A  is  reduced  by  rows,  so  its  rank  is  the 
number  of  non  zero  columns).  The  column  vectors  in  A,  that  is 


n1  =  (1,0,0),  v2  =  (1,1,  0),  n3  =  (-1,2,1), 
form  a  basis  for  M3.  It  is  also  clear  that  A  =  MS,B  =  M^d ,B ,  with  B  =  (v v2,  v 3). 
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C  13 

We  next  turn  to  study  how  Mf  ,  the  matrix  associated  to  a  linear  map  /  :  V  ->  W 
with  respect  to  the  bases  for  V  and  C  in  W,  is  transformed  under  a  change  of  basis 
in  V  and  W.  In  the  following  pages  we  shall  denote  by  V&  the  vector  space  V  referred 
to  its  basis  B . 

Theorem  7.9.9  Let  B  and  B'  be  two  bases  for  the  real  vector  space  V  and  C  and 
C  two  bases  for  the  real  vector  space  W.  With  f  :  V  —>  W  a  linear  map,  one  has 
that 

MCfB'  =  Mc  ,c  ■  MCf’B  ■  MB’B' . 

Proof  The  commutative  diagram 

f 

VB - >  Wc 


idy 


id  w 


f 


> 


shows  the  claim:  going  from  V'B  to  W'c  along  the  bottom  line  is  equivalent  to  going 
around  the  diagram,  that  is 

/  =  idw  o  /  o  id v 


Such  a  relation  can  be  translated  in  a  matrix  form, 

Mc  ,b'  -  MC3' 
mf  _  midwofoidv 

and,  by  recalling  the  Proposition  7.8.3,  we  have 


_  MC',C  mC,B 

Midwofoidv  —  Midw  '  '  Midv  ’ 


which  proves  the  claim. 


□ 


Exercise  7.9.10  Consider  the  linear  map  /  :  R2b 
matrix  is 

1  2 

A  =  MC3  =1-10 


whose  corresponding 


f 


1  1 


with  respect  to  B  =  ((1,  1),  (0,  1))  andC  =  ((1,  1,0),  (1,0,  1),  (0,  1,  1)).  We  deter¬ 
mine  the  matrix  B  =  M^3,Sl ,  with  £2  the  canonical  basis  for  M2  and  £3  the  canonical 
basis  for  M3 .  The  commutative  diagram  above  turns  out  to  be 
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fC.B 
J  A 

- > 


c 


i^K2 


idR3  > 


R2£2  - >  R3£3 

■  fe3.s  2 
J  B 


which  reads 

B  =  M£/'£l  =  M£i'c  A  MB’£l . 

We  have  to  compute  the  matrices  and  MB,£l.  Clearly, 


M£1'C  — 


and,  from  the  Theorem  7.9.5,  it  is 


M 


b,s2 


(the  last  equality  follows  from  the  Proposition  5.3.3).  We  have  then 


We  close  this  section  by  studying  how  to  construct  linear  maps  with  specific 
properties. 

Exercise  7.9.11  We  ask  whether  there  is  a  linear  map  /  :  M3  M3  which  fulfils 
the  conditions  /( 1,  0,  2)  =  0  and  Im (/)  =  £((1,  1,  0),  (2,  —1,  0)).  Also,  if  such  a 
map  exists,  is  it  unique? 

We  start  by  setting  v\  =  (1,  0,  2),  V2  =  (1,  1,  0),  V3  =  (2,  —1,  0).  Since  a  linear 
map  is  characterised  by  its  action  on  the  elements  of  a  basis  and  v\  is  required  to 
be  in  the  kernel  of  /,  we  complete  v\  to  a  basis  for  R3.  By  using  the  elements  of 
the  canonical  basis  £3,  we  may  take  the  set  (tq,  e\,  e^),  which  is  indeed  a  basis:  the 
matrix 

1  0  2\ 

100, 

0 1 0/ 
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whose  rows  are  given  by  (iq,  e\,  e2)  has  rank  3  (the  matrix  is  already  reduced  by 
rows).  So  we  can  take  the  basis  B  =  (vi,  e\,  e2)  and  define 


f{v  l)  =  0R3, 

f(e  i)  =  v2, 
f(e2)  =  v3. 


Such  a  linear  map  satisfies  the  required  conditions,  since  ker(/)  =  { r  i }  and 


Im  (/)  =  =  C(v2,v3). 

With  respect  to  the  bases  £  and  B  we  have 

/012\ 

MfB  =  01-1. 

\0  0  0/ 

In  order  to  understand  whether  the  required  conditions  can  be  satisfied  by  a  dif¬ 
ferent  linear  map  /,  we  start  by  analysing  whether  the  set  (iq ,  v2, v3)  itself  provides 
a  basis  for  M3.  As  usual,  we  reduce  by  rows  the  matrix  associated  to  the  vectors, 

/I  0  2\  /I  0  2\ 

A  =  [l  1  0)  [  1  1  0  I  . 

\2  —  1  0/  \3  0  0/ 

Such  a  reduction  gives  rk(A)  =  3,  that  is  C  —  (v\ ,  V2,  V3)  is  a  basis  for  M3.  Then, 
let  g  :  M3  — >  M3  be  defined  by 


g(vi)  =  0R3, 
g(v  2)  =  v2, 
g(v3)  =  v3. 


Also  the  linear  map  g  satisfies  the  conditions  we  set  at  the  beginning  and  the 
matrix 

/0  0  0\ 

Mg'C  =  0  10 

\°  0  1/ 

represents  its  action  by  the  basis  C.  It  seems  clear  that  /  and  g  are  different. 
In  order  to  prove  this  claim,  we  shall  see  that  their  corresponding  matrices  with 
respect  to  the  same  pair  of  bases  differ.  We  need  then  to  find  We  know  that 

Mf's  =  Mf ’c  MC'B ,  with 
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0  1  2 
0  1  -1 
00  0 


7  Linear  Transformations 


Ms'c  = 
8 


since  the  column  vectors  in  M^,c  are  given  by  g (v\) ,  g (v2) ,  g (v3) .  The  columns  of 
the  matrix  Mq,b  are  indeed  the  components  with  respect  to  C  of  the  vectors  in  B, 
that  is 

Vi  =  Vi, 

1  1 

*T  =  -  v2  +  -  v3, 

2  1 


Thus  we  have 

/3  0  0  \ 

Mc- B  =  \  0  1  2  , 

\o  1  -\) 

and  in  turn, 

/0  1  2  \  /3  0  0  \  /0  1  0\ 

Mg,B  =  Mg,c  MC,B  =  01-1  I  01  2  =  001. 

\00  0  /  \0  i  -1/  \0  0  0 / 

This  shows  that  M^,B  7^  M^,B ,  so  that  g  7^  / . 
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8.1  The  Dual  of  a  Vector  Space 

Let  us  consider  two  finite  dimensional  real  vector  spaces  V  and  W,  and  denote  by 
Lin(  V  — >►  W)  the  collection  of  all  linear  maps  /  :  V  — >►  W .  It  is  easy  to  show  that 
Lin(  V  — ►  W)  is  itself  a  vector  space  over  R.  This  is  with  respect  to  a  sum  (/)  +  f2) 
and  a  product  by  a  scalar  (A/),  for  any  /i ,  fi,  f  £  Lin(  V  — >  W)  and  A  £  R,  defined 
pointwise ,  that  is  by 

(/i  +  /2)W  =  /iW  +  /2W 

(A/)(n)  =  A/(i>) 

for  any  n  £  V.  If  S  is  a  basis  for  V  (of  dimension  n)  and  C  a  basis  for  W  (of 
dimension  m),  the  map  Lin(V  — >  W)  — >  Rm,n  given  by 

is  an  isomorphism  of  real  vector  spaces  and  the  following  relations 

Mcf;®fi  =  Mcff  +  Mjf 

MCxf  =  \MCf’B  (8.1) 

hold  (see  the  Proposition  4.1.4).  It  is  then  clear  that  dim(Lin(V  — >  IT))  =  mn. 

In  particular,  the  vector  space  of  linear  maps  from  a  vector  space  V  to  R,  that  is 
the  set  of  linear  forms  on  V,  deserves  a  name  of  its  own. 

Definition  8.1.1  Given  a  finite  dimensional  vector  space  V,  the  space  of  linear  maps 
Lin(V  — ►  R)  is  called  the  dual  space  to  V  and  is  denoted  by  V*  =  Lin(V  — >  R). 

The  next  result  follows  from  the  general  discussion  above. 
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Proposition  8.1.2  Given  a  finite  dimensional  real  vector  space  V,  its  dual  space  V  * 
is  a  real  vector  space  with  dim(V*)  =  dim(  V). 

Let  B  =  (b\, . . . ,  bn)  be  a  basis  for  V.  We  define  elements  {<#}/=i,...,n  in  V*  by 


<Pi(bj)  =  S{ j  with 


1  if  i  —  j 
0  if  i^j. 


(8.2) 


With  V  3  v  =  x  i  b  i  +  . . .  +  x„  bn ,  we  have  for  the  components  that  x,  =  >fi(v).  If 
/  e  V*  we  write 


f{v)  =  f(b  i)  xi+  ...  +  f(bn)  xn 

=  f(b\ )  <pi(v)  +  . . .  +  f(bn)  ipn(v) 

=  (f(bi)(pi  +  . . .  +  f(b„)ip„)(v). 

This  shows  that  the  action  of  /  upon  the  vector  v  is  the  same  as  the  action  on  v  of  the 
linear  map  /  =  f(b\)p\  +  . . .  +  f(bn)pn,  that  is  we  have  that  V*  =  C(cp  i,  . . . ,  cpn). 
It  is  indeed  immediate  to  prove  that,  with  respect  to  the  linear  structure  in  V*,  the 
linear  maps  pi  are  linearly  independent,  so  they  provide  a  basis  for  V*.  We  have  then 
sketched  the  proof  of  the  following  proposition. 

Proposition  8.1.3  Given  a  basis  B  for  a  n-dimensional  real  vector  space  V,  the 
elements  pi  defined  in  (8.2)  provide  a  basis  for  V*.  Such  a  basis,  denoted  B*f  is 
called  the  dual  basis  to  B. 

We  can  also  write 


(  X\  \ 


f(v)  =  (f(b i)  . . .  f(bn))  : 

V  x„  j 


(8.3) 


Referring  to  the  Definition  7.1.12  (and  implicitly  fixing  a  basis  for  W  =  R),  the 
relation  (8.3)  provides  us  the  single  row  matrix  M ®  =  ( f(b\ )  . . .  f(bn))  associated 
to  /  with  respect  to  the  basis  B  for  V .  Its  entries  are  the  image  under  /  of  the  basis 
elements  in  B.  The  proof  of  the  proposition  above  shows  that  such  entries  are  the 
components  of  /  e  V*  with  respect  to  the  dual  basis  B*. 

Let  B'  be  another  basis  for  V,  with  elements  {£>•}/= i . n.  With 

v  =  x\b\  +  . . .  +  xnbn  =  x[b[  +  . . .  +  x'nb'n 


we  can  write,  following  the  Definition  7.9.3, 


n 


n 


X, 


=  J2(MB'’B)ksXs,  bi  =  Y,(MB''B)jib'j 

7  =  1 


.v  =  1 
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or,  in  a  matrix  notation, 


(x[\ 

=  MB'B 

Vn) 

\-\-n  / 

(b[  ...bn)  Mb,b 


(B.4) 


From  the  Theorem  7.9.9  we  have  the  matrix  associated  to  /  with  respect  to  B' 

Mf 


which  we  write  as 


(fih)  ■  ■  ■  f  (bn))  MB-B'  =  ( f(b[ ) . . .  f{b'n)).  (8.5) 

Since  the  entries  of  Mj  provide  the  components  of  the  element  /  G  V*  with  respect 

to  the  basis  B'*,  a  comparison  between  (8.4)  and  (8.5)  shows  that,  under  a  change 
of  basis  B  B'  for  V  and  the  corresponding  change  of  the  dual  basis  in  V*,  the 
components  of  a  vector  in  V*  are  transformed  under  a  map  which  is  the  inverse  of 
the  map  that  transforms  the  components  of  a  vector  in  V. 

The  above  is  usually  referred  to  by  saying  that  the  transformation  law  for  vectors 
in  V*  is  contravariant  with  respect  to  the  covariant  one  for  vectors  in  V.  In  Sect.  13.3 
we  shall  describe  these  facts  with  an  important  physical  example,  the  study  of  the 
electromagnetic  field. 

If  we  express  /  e  V*  with  respect  to  the  dual  bases  B*  and  B'*  as 

n  n 

f(v)=  Y2f(bi)ip„  =  ^fWkWk 

i  —  1  k—  1 

and  consider  the  rule  for  the  change  of  basis,  we  have 

n  n 

Y^(MB'-B)kif(b'k)<pi  = 
k,i= 1  k=  1 


Since  this  must  be  valid  for  any  /  e  V*,  we  can  write  the  transformation  law 

B *  #*: 


n 

<fi'k  =  y^1(MB'-B)kl<pi  that  is 

V2 

=  mb'-b 

i=\ 

WnJ 

\}Pn) 

It  is  straightforward  to  extend  to  the  complex  case,  mutatis  mutandis ,  all  the 
results  of  the  present  chapter  given  above.  In  particular,  one  has  the  following  natural 
definition. 
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Definition  8.1.4  Let  V  be  a  finite  dimensional  complex  vector  space.  The  set 
V*  =  Lin(V  — >  C)  is  called  the  dual  space  to  V. 

Indeed,  the  space  V*  is  a  complex  vector  space,  with  dim(V*)  =  dim(V),  and  a 
natural  extension  of  (8.2)  to  the  complex  case  allows  one  to  introduce  a  dual  basis 
B*  for  any  basis  B  of  V. 

Also,  we  could  consider  linear  maps  between  finite  dimensional  complex  vector 
spaces.  In  the  next  section  we  shall  explicitly  consider  linear  transformations  of  the 
complex  vector  space  Cn . 


8.2  The  Dirac’s  Bra-Ket  Formalism 

Referring  to  Sect.  3.4  let  us  denote  by  Hn  =  ( Cn ,  •)  the  canonical  hermitian  vector 
space.  Following  Dirac  (and  by  now  a  standard  practice  in  textbooks  on  quantum 
mechanics),  the  hermitian  product  is  denoted  as 

(  |  )  :  Cn  x  Cn  ->  C,  (z\w)  =  z\W\  H - F  znwn, 

for  any  z  =  (zi,  . . . ,  zn), w  —  (wfi,  . . . ,  wn)  e  Cn .  Thus  its  properties  (see  the  Propo¬ 
sition  3.4.2)  are  written  as  follows.  For  any  z,w,v  E  C”  and  a,b  e  C, 

(i)  (w\z)  =  ( z\w ), 

(ii)  (az  +  bw\v)  =  a(z\v)  +  b(w\v)  while  (v\az  +  bw)  =  a(v\z)  +  b(v\w), 

(iii)  (z\z)  >  0, 

(iv)  (z\z)  =  0  ^  z  =  (0,...,0)€Cn. 

Since  the  hermitian  product  is  bilinear  (for  the  sum),  for  any  fixed  w  e  Hn ,  the 
mapping 

fw  ■  v  (w\z) 

provides  indeed  a  linear  map  from  Cn  to  C,  that  is  fw  is  an  element  of  the  dual  space 
(C ”)*.  Given  a  hermitian  basis  B  =  {e\,  . . . ,  en}  for  Hn ,  with  w  =  (uq,  . . . ,  wn)t3 
and  z  =  (zi, . . . ,  z«)^,  one  has 

/wte)  =  U>lZl  +  .  .  .  +  WnZn • 

The  corresponding  dual  basis  S*  =  {£i,  for  (Cn)*  is  defined  in  analogy  to 

(8.2)  for  the  real  case  by  taking  £;  (ej)  =  Sij .  In  terms  of  the  hermitian  product,  these 
linear  maps  can  be  defined  as  £;(z)  =  {et  \ z).  Then,  to  any  w  =  (uq,  . . . ,  wn)j$  we 
can  associate  an  element  fw  =  w\S\  +  . . .  +  wn£n  in  (Cn)*,  whose  action  on  Cn  can 
be  written  as 

fw(v)  =  (w\v). 
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Thus,  via  the  hermitian  product ,  to  any  vector  w  e  C77  one  associates  a  unique  dual 
element  fw  e  (C77)*;  viceversa,  to  any  element  /  e  (C77)*  one  associates  a  unique 
element  w  e  Cn  in  such  a  way  that  /  =  fw: 

w  —  w\e\  T-  ...  T  u)nen  fw  =  w\£\  T-  . . .  T-  wn£n. 

Remark  8.2.1  Notice  that  this  bijection  between  C”  and  (C77)*  is  anti-linear  (for  the 
product  by  complex  numbers),  since  we  have  to  complex  conjugate  the  components 
of  the  vectors  in  order  to  satisfy  the  defining  requirement  of  the  hermitian  product 
in  Hn ,  that  is 

f\w  =  ^fw,  for  A  e  C,  m  e  C". 

For  the  canonical  euclidean  space  En  one  could  proceed  in  a  similar  manner  and 
in  such  a  case  the  bijection  between  En  and  its  dual  ( En )*  given  by  the  euclidean 
product  is  linear. 

Given  the  bijection  above,  Dirac’s  idea  was  to  split  the  hermitian  product  bracket. 
Any  element  w  e  Hn  provides  a  ket  element  | w)  and  a  bra  element  (w\  e  (C77)*. 
A  basis  for  Hn  is  then  written  as  made  of  elements  |  ef)  while  the  bra  elements  (ej  \ 
form  the  dual  basis  for  (C77)*,  with 


w  ~  w \e\  +  . . .  +  wnen  | w)  =  w\ \e\)  +  . . .  +  wn \en), 

f w  =  W\£\  +  . . .  +  wn£n  (rf|  —  Wi(ei  |  T  . . .  wn(en\  . 

The  action  of  a  bra  element  on  a  ket  element  is  just  given  as  a  bra-ket  juxtapposition , 
with 

fw(z)  =  <W|Z)  €  C. 

We  are  now  indeed  allowed  to  define  a  ket-bra  juxtaposition ,  that  is  we  have  elements 
T  =  \z)(w\.  The  action  of  such  a  T  from  the  left  upon  a  \u),  is  then  defined  as 

T  :  \u)  i->  \z){w\u). 

Since  (w\u)  is  a  complex  number,  we  see  that  for  this  action  the  element  T  maps  a 
ket  vector  linearly  into  a  ket  vector,  so  T  is  a  linear  map  from  Hn  to  Hn . 

Definition  8.2.2  With  z,  w  e  Hn ,  the  ket-bra  element  T  =  \z)  {w  \  is  the  linear  oper¬ 
ator  whose  action  is  defined  as  v  T(v)  =  {w\v)z  =  (w  •  v)z. 

It  is  then  natural  to  consider  linear  combination  of  the  form  T  =  s=\  Tks\?k)  (G I 

with  Tks  e  C  the  entries  of  a  matrix  T  e  C77’ 77  so  to  compute 

n  n 

T\ ej)  =  E  Tks\ek){es\ej)  =  y]  Tkj\ek) 

k,s=  1  k—  1 

Tkj  =  (ek\ T(ej)). 


(8.6) 
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In  order  to  relate  this  formalism  to  the  one  we  have  already  developed  in  this 

15  13 

chapter,  consider  a  linear  map  0  :  Hn  ->  Hn  and  its  associated  matrix  M,  ’  =  (aks) 

with  respect  to  a  given  hermitian  basis  B  =  (e ,  en).  From  the  Propositions 
7.1.13  and  7.1.14  it  is  easy  to  show  that  one  has 


akj  =  ek-(A(ej))  =  (ek\A(ej)).  (8.7) 

The  analogy  between  (8.6)  and  (8.7)  shows  that,  for  a  fixed  basis  of  Hn ,  the  action 
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of  a  linear  map  0  with  associated  matrix  A  =  M ^  ’  =  (aks)  is  equivalently  written 

as  the  action  of  the  operator 


n 

Ta(=  TA  =  aks\ek){es \ 

k,s— 1 

in  the  Dirac’s  notation.  The  association  A  ->  7^  is  indeed  an  isomorphism  of  (com¬ 
plex)  vector  space  of  dimension  n2. 

Next,  let  0,  0  be  two  linear  maps  on  Hn  with  associated  matrices  A,  B  with 
respect  to  the  hermitian  basis  B.  They  correspond  to  the  operators  that  we  write  as 
Ta  =  Ylrs= i  ars\er){es |  and  TB  =  Y*]  k= l  bjk\ej){ek I-  With  a  natural  juxtaposition 
we  write  the  composition  of  the  linear  maps  as 

n  n 

(f)  O  0  =  ^  ^  ^  '  CLrsb  jk\&r)  {&s  I  &  j)  {&k  I 

r,s= 1  j,k=  1 

—  ^  '  (^~^  cirjbjk) \er)  | . 

r A=1  7  =  1 

We  see  that  the  matrix  associated,  via  the  isomorphism  A  — >  7^  above,  to  the  com¬ 
position  0  o  0  has  entries  (r,  /:)  given  by  YTj= l  an^jk ,  thus  coinciding  with  the  row 
by  column  product  between  the  matrices  A  and  B  associated  to  0  and  0,  that  is 


Tab  —  7070. 


Thus,  the  Proposition  7.8.3  for  composition  of  matrices  associated  to  linear  maps  is 
valid  when  we  represent  linear  maps  on  Hn  using  the  Dirac’s  notation. 

All  of  this  section  has  clearly  a  real  version  and  could  be  repeated  for  the  (real) 
euclidean  space  En  with  its  linear  maps  and  associated  real  matrices  T  e  Rn,w. 
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Endomorphisms  and  Diagonalization 
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Both  in  classical  and  quantum  physics,  and  in  several  branches  of  mathematics,  it 
is  hard  to  overestimate  the  role  that  the  notion  of  diagonal  action  of  a  linear  map 
has.  The  aim  of  this  chapter  is  to  introduce  this  topic  which  will  be  crucial  in  all  the 
following  chapters. 


9.1  Endomorphisms 

Definition  9.1.1  Let  V  be  a  real  vector  space.  A  linear  map  <fi  :  V  — >  V  is  called 
an  endomorphism  of  V.  The  set  of  all  endomorphisms  of  V  is  denoted  End(V).  Non 
invertible  endomorphisms  are  also  called  singular  or  degenerate. 

As  seen  in  Sect.  8.1,  the  setEnd(U)  is  areal  vector  space  with  dim(End(U))  =  n2 
if  dim(V)  =  n. 

The  question  we  address  now  is  whether  there  exists  a  class  of  bases  of  the  vector 
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space  V ,  with  respect  to  which  a  matrix  M0  ’  has  a  particular  (diagonal,  say)  form. 
We  start  with  a  definition. 

Definition  9.1.2  The  matrices  A,  B  e  are  called  similar  if  there  exists  a  real 

13  13 

vector  space  V  and  an  endomorphism  f  e  End(V)  such  that  A  =  M ^  ’  and 

n  n 

B  =  M0  ’  ,  where  B  and  C  are  bases  for  V .  We  denote  similar  matrices  by  A  ~  B. 

Similarity  between  matrices  can  be  described  in  a  purely  algebraic  way. 

Proposition  9.1.3  The  matrices  A,  B  e  Mn,n  are  similar  if  and  only  if  there  exists 
an  invertible  matrix  P  e  GL (n),  such  that  P~lAP  =  B. 
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Proof  Let  us  assume  A  ^  B:  we  then  have  a  real  vector  space  V ,  bases  B  and  C  for 

13  13  C  C 

it  and  an  endomorphism  <f>  e  End(V)  such  that  A  =  M0  ’  e  B  =  M0  ’  .  From  the 
Theorem 7.9.9  we  have 

B  =  MC’B  A  MB'C. 

Since  the  matrix  MC,B  is  invertible,  with  (MC,B)~ 1  =  MB,C ,  the  claim  follows  with 
P  =  Mb,c. 

Next,  let  us  assume  there  exists  a  matrix  P  e  GL(n)  such  that  P~lAP  =  B. 
From  the  Theorem 7.9.6  and  the  Remark 7.9.7  we  know  that  the  invertible  matrix 
P  gives  a  change  of  basis  in  W1:  there  exists  a  basis  C  for  W1  (the  columns  of  P), 
with  P  =  Ms,c  and  P~l  =  Mc,s .  Fet  <f>  =  be  the  endomorphism  in  W1 

c  c 

corresponding  to  the  matrix  A  with  respect  to  the  canonical  bases,  A  =  M0  ’  .  We 
then  have 

B  =  P~l  AP 

=  Mc’£  Mfe  M£’c 

=  K’c- 

This  shows  that  B  corresponds  to  the  endomorphism  with  respect  to  the  different 
basis  C,  that  is  A  and  B  are  similar.  □ 

Remark  9.1.4  The  similarity  we  have  introduced  is  an  equivalence  relation  in  Wl,n, 
since  it  is 

(a)  reflexive,  that  is  A  ~  A  since  A  =  InAIn , 

(b)  symmetric,  that  is  A  ^  B  ^  B  ^  A  since 

P~l  AP  =  B  =>  P  B  P~l  =  A, 

(c)  transitive,  that  is  A  ~  B  and  B  ~  C  imply  A  ~  C,  since  P~l  A  P  =  B  and 
Q~{  B  Q  =  C  clearly  imply  Q~{  P~l  A  P  Q  =  ( PQ)~lA(PQ )  =  C. 

If  A  6  Rn,n,  we  denote  its  equivalence  class  by  similarity  as  [A]  =  {B  e  : 
B  -  A}. 

Proposition  9.1.5  Let  matrices  A,  B  e  W,n  be  similar.  Then 

det(Z?)  =  det(A)  and  tr(Z?)  =  tr(A). 

Proof  From  Proposition  9. 1 .3 ,  we  know  there  exists  an  invertible  matrix  P  e  GF  (n), 
such  that  P~l  A P  =  B.  From  the  Binet  Theorem 5. 1.1 6  and  the  Proposition 4.5.2  we 
can  write 

det(fl)  =  det(  P~lAP) 

=  detC/5-1)  det(A)  det(P)  =  det(P_1)  det(P)  det(A) 

=  det(A) 


and  tr (B)  =  tr (P  1  AP)  =  tr (P P  lA)  =  tr(A). 


□ 
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A  natural  question  is  whether,  for  a  given  A,  the  equivalence  class  [A]  contains  a 
diagonal  element  (equivalently,  whether  A  is  similar  to  a  diagonal  matrix). 

Definition  9.1.6  A  matrix  A  eRn,n  is  called  diagonalisable  if  it  is  similar  to  a 
diagonal  (A  say)  matrix,  that  is  if  there  is  a  diagonal  matrix  A  in  the  equivalence 
class  [A]. 

Such  a  definition  has  a  counterpart  in  terms  of  endomorphisms. 

Definition  9.1.7  An  endomorphism  p  e  End(V)  is  called  simple  if  there  exists  a 

Y2  Y2 

basis  B  for  V  such  that  the  matrix  M0  ’  is  diagonalisable. 

We  expect  that  for  an  endomorphism  to  be  simple  is  an  intrinsic  property  which 
does  not  depend  on  the  basis  with  respect  to  which  its  corresponding  matrix  is  given. 
The  following  proposition  confirms  this  point. 

Proposition  9.1.8  Let  V  be  a  real  vector  space,  with  p  e  End(V).  The  following 
are  equivalent: 

13  13 

(i)  p  is  simple,  there  is  a  basis  B  for  V  such  that  M ^  ’  is  diagonalisable, 

c  c 

( ii )  there  exists  a  basis  C  for  V  such  that  M ,  ’  is  diagonal, 

T)  T) 

(iii)  given  any  basis  V  for  V,  the  matrix  M0  ’  is  diagonalisable. 

V2  V2 

Proof  (i)  =>  (ii):  Since  M0  ’  is  similar  to  a  diagonal  matrix  A,  from  the  proof 
of  the  Proposition  9.  E 3  we  know  that  there  is  a  basis  C  with  respect  to  which 
A  =  M^,c  is  diagonal. 

(ii)  =>  (iii):  Let  C  be  a  basis  of  V  such  that  M0  ’  =  A  is  diagonal.  For  any  basis  V 

7}  7}  7)  7} 

we  have  then  M0  ’  ~  A,  thus  M ^  ’  is  diagonalisable. 

(iii)  =>►  (i):  obvious. 

□ 


9.2  Eigenvalues  and  Eigenvectors 

c  c 

Remark  9.2.1  Let0  :  V  V  be  a  simple  endomorphism,  with  A  =  ’  a  diag¬ 

onal  matrix  associated  to  p.  It  is  then 

/ Ai  0  0  ...  0\ 

0  A2  0  •  • •  0 
A  =  ...  .  , 

\0  0  0  •••  A J 

for  scalars  A  j  e  R,  with  j  =  1, . . . ,  n.  By  setting  C  =  (v\ . vn),  we  write  then 

4>(vj)  =  A  jVj. 
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The  vectors  of  the  basis  C  and  the  scalars  A  j  plays  a  prominent  role  in  the  analysis 
of  endomorphisms.  This  motivates  the  following  definition. 

Definition  9.2.2  Let  <fi  g  End(V)  with  V  a  real  vector  space.  If  there  exists  a  non 
zero  vector  v  g  V  and  a  scalar  A  g  R,  such  that 


cj)(v)  =  An, 

then  A  is  called  an  eigenvalue  of  <fi  and  v  is  called  an  eigenvector  of  f  associated  to 
A.  The  spectrum  of  an  endomorphism  is  the  collection  of  its  eigenvalues. 

Remark  9.2.3  Let  e  End(V)  and  C  be  a  basis  of  V.  With  the  definition  above, 
the  content  of  the  Remark  9.2.1  can  be  rephrased  as  follow: 

(a)  M(j)  ’  is  diagonal  if  and  only  if  C  is  a  basis  of  eigenvectors  for  </>, 

(b)  <j)  is  simple  if  and  only  if  V  has  a  basis  of  eigenvectors  for  (from  the  Defini¬ 
tion  9. 1.7). 

Notice  that  each  eigenvector  n  for  an  endomorphism  f  is  uniquely  associated  to 
an  eigenvalue  A  of  0.  On  the  other  hand,  more  than  one  eigenvector  can  be  associated 
to  a  given  eigenvalue  A.  It  is  indeed  easy  to  see  that,  if  v  is  associated  to  A,  also  av , 
with  a  G  R,  is  associated  to  the  same  A  since  f (av)  =  a<f(v)  =  a(Xv)  =  X(av). 

Proposition  9.2.4  If  V  is  a  real  vector  space  ,  and  e  End(V),  the  set 

Vx  =  {v  eV  :  f(v)  =  An} 


is  a  vector  sub  space  in  V. 

Proof  We  explicitly  check  that  V\  is  closed  under  linear  combinations.  With  v\ ,  V2  £ 
V\  and  a\,  a2  £  R,  we  can  write 


4>(a\V\  +  a2v2)  =  a\(j>(vi)  +  a2f(v2)  =  aiXvi  +  a2Xv2  =  X(aiVi  +  a2v2), 


showing  that  V\  is  a  vector  subspace  of  V  □ 

Definition  9.2.5  If  A  g  R  is  an  eigenvalue  of  f  g  End(V),  the  space  V\  is  called 
the  eigenspace  corresponding  to  A. 

Remark  9. 2. 6  It  is  easy  to  see  that  if  A  G  R  is  not  an  eigenvalue  for  the  endomorphism 
0,  then  the  set  V\  =  {v  e  V  \f>(v)  =  An}  contains  only  the  zero  vector.  It  is  indeed 
clear  that,  if  V\  contains  the  zero  vector  only,  then  A  is  not  an  eigenvalue  for  f.  We 
have  that  A  g  R  is  an  eigenvalue  for  f  if  and  only  if  dim  (VA)  >  L 

Exercise  9.2.7  Let  f  g  End(M2)  be  defined  by  =  (y,  x).  Is  A  =  2  an 

eigenvalue  for  The  corresponding  set  V2  would  then  be 


V2  =  {u  e  M2  :  <f>(y)  =  2v}  =  {(x,  y)  €  M2  :  (j,  x)  =  2(x,  y)}, 
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that  is,  V2  would  be  given  by  the  solutions  of  the  system 


y  =  2x 
x  =  2  y 


y  =  2x 
x  =  4x 


Since  V2  =  {(0,  0)},  we  conclude  that  A  =  2  is  not  an  eigenvalue  for  </>. 


Exercise  9.2.8  The  endomorphism  e  End(M2)  given  by  </>((v,  y))  =  (2x,  3y)  is 
simple  since  the  corresponding  matrix  with  respect  to  the  canonical  basis  £  =  (e\,  e^) 
is  diagonal, 


Mf8 


Its  eigenvalues  are  Ai  =2  (with  eigenvector  e{)  and  A2  =  3  (with  eigenvector  e 2). 
The  corresponding  eigenspaces  are  then  V2  =  C(e  1)  and  V3  =  £(^2)- 

Exercise  9.2.9  We  consider  again  the  endomorphism  </>((v,  y))  =  (y ,  x)  in  M2  given 
in  the  Exercise  9.2.7 .  We  wonder  whether  it  is  simple.  We  start  by  noticing  that  its 
corresponding  matrix  with  respect  to  the  canonical  basis  is  the  following, 


Ml'£ 


which  is  not  diagonal.  We  look  then  for  a  basis  (if  it  exists)  with  respect  to  which 
the  matrix  corresponding  to  0  is  diagonal.  By  recalling  the  Remark  9.2.3  we  look 
for  a  basis  of  M2  made  up  of  eigenvectors  for  </>.  In  order  for  v  =  (a,  b)  to  be  an 
eigenvector  for  0,  there  must  exist  a  real  scalar  A  such  that  cj)((a,  b))  =  A  (a,  b). 


b  =  Xa 
a  =  Xb’ 


It  follows  that  the  eigenvalues,  if  they  exist,  must  fulfill  the  condition  A2  =  1.  For 
A  =  1  the  corresponding  eigenspace  is 

V!  =  ((ij)el2  :  4>{(x,y))  =  (x,y)}  =  {(x,x)eK2}  =  £((1,1)). 

And  for  A  =  —  1  the  corresponding  eigenspace  is 

V_1  =  {(x,  y)  e  R2  :  4>((x,y))  =  ~(x,y)}  =  [(x,  —x)  e  R2}  =  £((1,-1)). 

Since  the  vectors  (1,  1),  (1,  —1)  form  a  basis  B  for  M2  with  respect  to  which  the 
matrix  of  (j>  is 


M 


B3 

0 


1  0 
0  -1 
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we  conclude  that  is  simple.  We  expect  M ^  ~  M0  ’  ,  since  they  are  associated 

to  the  same  endomorphism;  the  algebraic  proof  of  this  claim  is  easy.  By  defining 

P  =  M£'B  =  (j  jA 

the  matrix  of  the  change  of  basis,  we  compute  explicitly, 


that  is  p-[MBSP  =  MB'B  (see  the  Proposition 9. 1.3). 

Not  any  endomorphism  is  simple  as  the  following  exercise  shows. 

Exercise  9.2.10  The  endomorphism  in  M2  defined  as  f((x,  y))  =  (— y,  x )  is  not 
simple.  For  v  =  (a,  b)  to  be  an  eigenvector,  f((a,  b))  =  X  (a,  b)  it  would  be  equiv¬ 
alent  to  (—b,  a )  =  X(a ,  b ),  leading  to  A2  =  —  1.  The  only  solution  in  R  is  then 
a  =  b  =  0,  showing  that  <fi  is  not  simple. 

Proposition  9.2.11  Let  V  be  a  real  vector  space  with  f  e  End(V).  If  Ai,  A2 
are  distinct  eigenvalues,  any  two  corresponding  eigenvectors,  0  7^  v\  e  V\l  and 
0  7^  V2  G  V\2,  are  linearly  independent.  Also,  the  sum  V\1  +  V\2  is  direct. 

Proof  Let  us  assume  that  n2  =  av  1,  with  R  9  a  7^  0.  By  applying  the  linear  map 
f  to  both  members,  we  have  </>(n2)  =  acj)(v\).  Since  v\  and  n2  are  eigenvectors  with 
eigenvalues  Ai  and  A2, 


f(vi)  =  Ai  vi 

f(v2)  =  A2  v2 


and  the  relation  </>(n2)  =  a<j){v{),  using  v2  =  av\  become 


X2V2  =  ce(Aini)  =  Ai(aui)  =  Ain2, 


that  is 

(A2  —  Ai)u2  =  Oy. 

Since  A2  7^  Ai,  this  would  lead  to  the  contradiction  n2  =  Oy.  We  therefore  conclude 
that  v\  and  n2  are  linearly  independent. 

For  the  last  claim  we  use  the  Proposition 2.2. 13  and  show  that  V\l  Fi  V\2  =  {0y }. 
If  v  e  V\l  fl  V\2 ,  we  could  write  both  <f(v)  =  Ain  (since  v  e  V\{)  and  <f(v)  =  X2v 
(since  v  e  V\2):  it  would  then  be  Ain  =  A2n,  that  is  (Ai  —  A2)n  =  0y.  From  the 
hypothesis  Ai  7^  A2,  we  would  get  v  =  0y.  □ 

The  following  proposition  is  proven  along  the  same  lines. 
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Proposition  9.2.12  Let  V  be  a  real  vector  space,  with  f  e  End(V).  Let  Ai,  . . . , 
As  e  R  be  distinct  eigenvalues  of  f  with  Oy  j -  vj  e  V\},  j  —  l,  s  corresponding 
eigenvectors.  The  set  {v\,  . . . ,  vs}  is  free,  and  the  sum  V\l  +  •  •  •  +  V\s  is  direct. 

Corollary  9.2.13  Iff  is  an  endomorphism  of  the  real  vector  space  V,  with  dim(  V )  = 
n,  then  f  has  at  most  n  distinct  eigenvalues. 

Proof  If  f  had  s  >  n  distinct  eigenvalues,  there  would  exist  a  set  v\, . . . ,  vs  of  non 
zero  corresponding  eigenvectors.  From  the  proposition  above,  such  a  system  should 
be  free,  thus  contradicting  the  fact  that  the  dimension  of  V  is  n.  □ 

Remark  9.2.14  Let  f  and  f  be  two  commuting  endomorphisms,  that  is  they  are 
such  that  for  any  v  e  V.  If  v  e  V\  is  an  eigenvector  for  f 

corresponding  to  A,  it  follows  that 

4>(ip(v))  =  ip(<t>(v))  =  X'lpiv). 

Thus  the  endomorphism  f  maps  any  eigenspace  V\  of  f  into  itself,  and  analogously 
f  preserves  any  eigenspace  Vf  of  f. 

Finding  the  eigenspaces  of  an  endomorphism  amounts  to  compute  suitable  ker¬ 
nels.  Let  /  :  V  — >►  W  be  a  linear  map  between  real  vector  spaces  with  bases  B  and 
C.  We  recall  (see  Proposition  7.5.1)  that  if  A  =  MCfB  and  X  :  AX  =  0  is  the  linear 
system  associated  to  A,  the  map  >  ker (/)  given  by 

( V  \  ,  .  .  .  ,  Xyi  )  I  ^  ( X  \  ,  .  .  .  ,  Xyi  )  23 

is  an  isomorphism  of  vector  spaces. 

Lemma  9.2.15  IfV  is  a  real  vector  space  with  basis  B,  let  f  e  End(V)  and  A  e  R. 
Then 

V\  =  ker(0  -  Aidv)  =  ^Ea, 

where  S^x  is  the  space  of  the  solutions  of  the  linear  homogeneous  system 

%  :  (Mf‘ B  -  A /„)  X  =0. 

Proof  From  the  Definition  9.2.4  we  write 

Vx=  {v  eV  :  f(v)  =  Xv} 

=  {v  e  V  :  f(v)  —  Xv  =  Oy} 

=  ker  (f  —  Aidy). 

Such  a  kernel  is  isomorphic  (as  recalled  above)  to  the  space  of  solutions  of  the  linear 
system  given  by  the  matrix  M^yXidy ,  where  B  is  an  arbitrary  basis  of  V.  We  conclude 

by  noticing  that  =  NtfB  -  A  /„.  □ 
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Proposition  9.2.16  Let  <f>  e  End(V)  be  an  endomorphism  of  the  real  vector  space 
V,  with  dim(V)  =  n,  and  let  A  e  R.  The  following  are  equivalent: 

(i)  A  is  an  eigenvalue  for  6, 

(ii)  dim(VA)  >  l 

(iii)  det (M®,B  —  A In)  =  0  for  any  basis  B  in  V. 

Proof  (i)  4^  (ii)  is  the  content  of  the  Remark  9.2.6; 

(ii)  (hi).  Let  B  be  an  arbitrary  basis  of  V,  and  consider  the  linear  system 

:  (m*-b  -  XIn) 

X  =  0.  We  have 

dim(VA)  =  dim(SEA) 

=  n  -  rk  (m8’b  -  A/„)  ; 

the  first  and  the  second  equality  follow  from  Definition  6.2.1  and  Theorem  6.4.3 
respectively.  From  Proposition  5.3.1  we  finally  write 

dim(V\)  >1  §  rk  (m®j b  -  A /„)  <  n  det  (m®’8  -  A /„)  =  0, 

which  concludes  the  proof.  □ 

This  proposition  shows  that  the  computation  of  an  eigenspace  reduces  to  finding 
the  kernel  of  a  linear  map,  a  computation  which  has  been  described  in  the  Proposi¬ 
tion?. 5.1. 

9.3  The  Characteristic  Polynomial  of  an  Endomorphism 


In  this  section  we  describe  how  to  compute  the  eigenvalues  of  an  endomorphism. 
These  will  be  the  roots  of  a  canonical  polynomial  associate  with  the  endomorphism. 

Definition  9.3.1  Given  a  square  matrix  A  e  the  expression 

Pa(T)  =  det  (A-TIn) 

is  a  polynomial  of  order  n  in  T  with  real  coefficients.  Such  a  polynomial  is  called 
the  characteristic  polynomial  of  the  matrix  A. 

Exercise  9.3.2  If  A  =  l an  an  )  is  a  square  2x2  matrix,  then 

\021  <222  / 


Pa(T)  = 


a\\  —  T  a\2 
<221  <222  —  T 


—  T2  —  ( a\\  +  $22)  T  +  (a\\a22 
=  T2  -  (tr(A))  T  +  (det(A)). 


<2l2<22l) 
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If  Ai  and  A2  are  the  zeros  (the  roots)  of  the  polynomial  pa(T ),  with  elementary 
algebra  we  write 

Pa(T)  =  T2  —  (Ai  +  A2)  T  +  A1A2 


thus  obtaining 


Ai  +  A2  =  a\\  +^22  =  tr(A),  A1A2  =  (^11^22  —  ^12^21)  —  det(A). 


Proposition  9.3.3  Let  V  be  a  real  vector  space  with  dim(V)  =  n,  and  let 

6  e  End(V).  For  any  choice  of  bases  B  and  C  in  V,  with  corresponding  matri- 

A  1  d  A/rC,C  • 

ces  A  =  and  B  =  M ^  ,  it  is 

Pa(T)  =  Pb(T). 

Proof  We  know  that  B  =  P~{  AP,  with  P  =  MB,C  the  matrix  of  change  of  basis. 
So  we  write 

B  -  TIn  =  P~lAP  -  P~\TIn)P  =  P~\A  -  T In)P . 


From  the  Binet  Theorem  5.1.16  we  have  then 

det (B  -  TIn)  =  det(P-1(A  -  TIn)P)  =  det(P_1)  det(A  -  TIn)det(P) 

=  det  (A  -  TIn), 

which  yields  a  proof  of  the  claim,  since  det(P_1)  det(P)  =  det (4)  =  1.  □ 

Given  a  matrix  A  e  Rn,n,  an  explicit  computation  of  det  (A  —  T I n)  shows  that 

pA(T)  =  (-1  )nTn  +  (— l)w_1tr(A)  Tn~l  +  •  •  •  +  det(A). 


The  case  n  =  2  is  the  Exercise  9.3.2. 

Given  <f>  e  End(V),  the  Proposition 9.3.3  shows  that  the  characteristic  polyno¬ 
mial  of  the  matrix  associated  to  <f>  does  not  depend  on  the  given  basis  of  V. 

Definition  9.3.4  For  any  matrix  A  associated  to  the  endomorphism  <f>  e  End(V), 
the  polynomial  p^iT)  =  pa(T)  is  called  the  characteristic  polynomial  of  f. 

From  the  Proposition 9.2. 16  and  the  Definition 9.3.4  we  have  the  following  result. 

Corollary  9.3.5  The  eigenvalues  of  the  endomorphism  <f>  e  End(V)  (the  spectrum 
off)  are  the  real  roots  of  the  characteristic  polynomial  p^iT). 

Exercise  9.3.6  Let  f  e  End(M2)  be  associated  to  the  matrix 


Mi’s 


Since  p,,(T)  =  T2  +  1,  the  endomorphism  has  no  (real)  eigenvalues. 
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Definition  9.3.7  Let  p(X )  be  a  polynomial  with  real  coefficients,  and  let  a  be  one  of 
its  real  root.  From  the  fundamental  theorem  of  algebra  (see  the  Proposition  A. 5. 7)  we 
know  that  then  ( X  —  a)  is  a  divisor  for  p(X ),  and  that  we  have  the  decomposition 

p(X)  =(X  -  •  q(X) 

where  q(X)  is  not  divisible  by  ( X  —  a)  and  1  <  m(a)  is  an  integer  depending  on 
a.  Such  an  integer  is  called  the  multiplicity  of  a. 

Exercise  9.3.8  Let  p{X)  =  (X  —  2)(X  —  3)(X2  +  1).  Its  real  roots  are  2  (with  mul¬ 
tiplicity  m  (2)  =  1,  since  ( X  —  3)(X2  +  1)  cannot  be  divided  by  2)  and  3  (with  multi¬ 
plicity  m( 3)  =  1).  Clearly  the  polynomial  p(X)  has  also  two  imaginary  roots,  given 
by  =bi. 

Proposition  9.3.9  Let  V  be  a  real  vector  space  with  p  e  End(L).  If  A  is  an  eigen¬ 
value  for  p  with  multiplicity  m( A)  and  eigenspace  V\,  it  holds  that 

1  <  dim(VA)  L  m( A). 


Proof  Let  r  =  dim(  V\)  and  C  be  a  basis  of  V\.  We  complete  C  to  a  basis  B  for  V .  We 
then  have  B  =  (v\,  ...  ,vr,  iy+i,  . . . ,  vn),  where  the  first  elements  v\, . . . ,  vr  e  V\ 


13  13 

are  eigenvectors  for  A.  The  matrix  M ^  ’  has  the  following  block  form, 


A  =  Mf3  = 


0  0 
0  0 
0  0 


\0  0 


^l,r+l 

•  •  d\,n 

tt2,r+[  • 

•  •  &2,n 

^r,r  + 1 

. .  ar  n 

^r+l,r+l  • 

. .  ar-\-  \  n 

^r+2,r+l  • 

.  .  arsr2,n 

^n,r  + 1 

•  •  &n,n 

If  det(A  —  TIn)  is  computed  by  the  Laplace  theorem  (with  respect  to  the  first  row, 
say),  we  have 

Pt(T)  =  det(A  -  TIn)  =  (A  -  T)rg(T ), 


where  g(T)  is  the  characteristic  polynomial  of  the  lower  diagonal  (n  —  r)  x  (n  —  r) 
square  block  of  A.  We  can  then  conclude  that  r  <  m( A).  □ 

Definition  9.3.10  The  integer  dim(Vx)  is  called  the  geometric  multiplicity  of  the 
eigenvalue  A,  while  m( A)  is  called  the  algebraic  multiplicity  of  the  eigenvalue  A. 

Remark  9.3.11  Let  p  e  End(V). 

(a)  If  A  =  0  is  an  eigenvalue  for  p,  the  corresponding  eigenspace  Vo  is  ker (p). 
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(b)  If  A  7^  0  is  an  eigenvalue  for  <fi,  then  V\  c  Im (</>): 

let  us  indeed  consider  Oy  /  v  e  V\  with  cj)(v)  =  Xv.  Since  A  7^  0,  we  divide 
by  A  and  write 

v  =  A ~lcj){v)  =  cj)( X~lv)  e  Im (</>). 


(c)  If  Ai  7^  A2  7^  •  •  •  7^  As  are  distinct  non  zero  eigenvalues  for  </>,  from  the  Propo¬ 
sition  9.2. 12  we  have  the  direct  sum  of  corresponding  eigenspaces  and 

V\\  0  •  •  •  0  V\s  c  Im(</>). 


Exercise  9.3.12  Let  0  e  End(M4)  be  given  by 

0((v,  y,  z,  0)  =  (2*  +  4y,  v  +  2y,  -z  -  2f,  z  +  0- 

The  corresponding  matrix  with  respect  to  the  canonical  basis  £4  is 

(2  4  0  0  \ 

12  0  0 
0  0-1-2  ' 

\0  0  1  1  / 

Its  characteristic  polynomial  reads 

P^iT)  =  pA(r)  =  det(A  -  TU) 

2- T  4  0  0 

1  2- T  0  0 

0  0  -l-T  -2 

0  0  1  1  -  T 


2-  T  4 

<N 

1 

1 

t-H 

1 

1  2  —  T 

1 

r- H 

=  T(T  -4)(T2  +  1). 

The  eigenvalues  (the  real  roots  of  such  a  polynomial)  of  0  are  A  =  0,  4.  It  is  easy  to 
compute  that 


A  =  = 


Vo  =  ker(0)  =  £((—2,  1,0,  0)), 

V4  =  ker(<p  -  4/4)  =  £(( 2,  1,0,  0)). 


This  shows  that  V4  is  the  only  eigenspace  corresponding  to  a  non  zero  eigenvalue 
for  0. 
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From  the  Theorem 7.6.4  we  know  that  dim  Im (0)  =  4  —  dim  ker (0)  =  3,  with  a 
basis  of  the  image  of  0  given  by  3  linearly  independent  columns  in  A.  It  is  immediate 
to  notice  that  the  second  column  is  a  multiple  of  the  first  one,  so  we  have 

lm(0)  =  £(( 2,  1,  0,  0),  (0,  0,  -1,  1),  (0,  0,  -2,  1)). 


It  is  evident  that  V4  C  Im (0),  as  shown  in  general  in  the  Remark 9. 3.11. 
Exercise  9.3.13  We  consider  the  endomorphism  in  M4  given  by 

0(0,  y,  Z,  0)  =  (2x  +  4y,  x  +  2 y,  -z,  z  +  0, 
whose  corresponding  matrix  with  respect  to  the  canonical  basis  £4  is 


24  0  0\ 
12  0  0 
0  0-10 
\0  0  1  1/ 

The  characteristic  polynomial  reads 


A  =  M8'8  = 


P(j)(T)  =  pA(T )  =  det(A  -  TI4) 

2  -  T  4  0  0 

1  2-T  0  0 

”  0  0  -1  -  T  0 

0  0  1  1  -  T 

=  T(T  -4)(T  +  1  )(T  -  1). 


The  eigenvalues  are  given  by  A  =  0,4,— 1,1.  The  corresponding  eigenspaces  are 


Vo  =  ker(0)  =  £((-2,  1,0,0)), 

V4  =  ker(0  -  40)  =  £((2,  1,  0,  0)), 

=  ker(0  +  I4)  =  £((0,  0,  -2,  1)), 
Vi  =  ker(0  -  I4)  =  £((0,  0,  0,  1)), 


with 

lm(0)  =  V-i  0  V\  0  V4. 


The  characteristic  polynomial  p^iT)  of  an  endomorphism  over  a  real  vector  space 
has  real  coefficients.  If  Ai, . . . ,  are  its  non  zero  real  distinct  roots  (that  is,  the 
eigenvalues  of  0),  we  can  write 


P<t>(T)  =  (T  -  Xi)m  ...  (T  -  X p)m-  •  q(T), 
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where  mj,  j  =  1,  . . . ,  s  are  the  algebraic  multiplicities  and  q(T)  has  no  real  roots. 
We  have  then 

degQ^(r))  >  mH - b  ms. 

This  proves  the  following  proposition. 

Proposition  9.3.14  Let  V  be  a  real  vector  space  with  dim(V)  =  n,  and  let  f  £ 
End(V).  By  denoting  X\,  ...  ,XS  the  distinct  eigenvalues  of  (j)  with  corresponding 
algebraic  multiplicities  m i,  . . . ,  ms,  one  has 


M\  +  •  •  •  +  ms  <  n, 


with  the  equality  holding  if  and  only  if  every  root  in  p^fL)  is  real. 


□ 


9.4  Diagonalisation  of  an  Endomorphism 

In  this  section  we  describe  conditions  under  which  an  endomorphism  is  simple.  As  we 
have  seen,  this  problem  is  equivalent  to  study  conditions  under  which  a  square  matrix 
is  diagonalisable.  The  first  theorem  we  prove  characterises  simple  endomorphims. 

Theorem  9.4.1  Let  V  be  a  real  n- dimensional  vector  space,  with  £  End(V). 
If  X\,  . . . ,  Xs  are  the  different  roots  of  p^T)  with  multiplicities  m\,  . . . ,  ms,  the 
following  claims  are  equivalent: 

(a)  f  is  a  simple  endomorphism, 

(b)  V  has  a  basis  of  eigenvectors  for  f, 

(c)  X  i  £  Rfor  any  i  =  1,  . . . ,  s,  with  V  =  V\l  0  •  •  •  0  V\s, 

(d)  X i  £  R  and  mi  =  dim (V\.)  for  any  i  =  1,  . . . ,  s. 

When  f  is  simple,  each  basis  ofV  of  eigenvectors  for  f  contains  mi  eigenvectors  for 
each  distinct  eigenvalues  A/,  for  i  =  1, . . . ,  s. 

Proof  •  (a)  4=>  (b):  this  has  been  shown  in  the  Remark 9.2.3. 

•  (b)  =b  (c):  let  B  =  (tq ,  . . . ,  vn)  be  a  basis  of  V  of  eigenvectors  for  <fi.  Any  vector 
Vi  belongs  to  one  of  the  eigenspaces,  so  we  can  write 

V  =  C(v\,...,vn)  c  VX]  +  •  •  •  +  V\s, 

while  the  opposite  inclusion  is  obvious.  Since  the  sum  of  eigenspaces  corre¬ 
sponding  to  distinct  eigenvalues  is  direct  (see  the  Proposition  9.2. 12),  we  have 
V  =  VAi  ®  •  •  •  ©  V\s . 

•  (c)  =>  (b) :  let  £>/  be  a  basis  of  V\i  for  any  i .  Since  V  is  the  direct  sum  of  all  the 
eigenspaces  VXi ,  the  set  B  =  B\  U  ...  U  Bs  is  a  basis  of  V  made  by  eigenvectors 
for  <f. 
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•  (c)  =>►  (d):  from  the  Grassmann  Theorem 2.5.8,  we  have 


n  =  dim(V)  =  dim(VAl  ®  •  •  •  ®  V\s) 

=  dim(VAl)  4 - bdim(VAJ 

<  mi  +  •  •  •  +  ms 

<  n, 

where  the  inequalities  follow  from  the  Propositions  9.3.9  and  9.3.14.  We  can  then 
conclude  that  dim(VA.)  =  m( A;)  for  any  i. 

•  (d)  =>  (c):  from  the  hypothesis  =  dim(VA.)  for  any  i  =  1,  . . . ,  s,  and  the 
Proposition 9.3. 14  we  have 

n  =  mi~\ - Yms  —  dim(VAl)  +  •••  +  dim(VA?). 

We  have  then  n  =  dim(VAl  ®  •  •  •  ®  VAJ  and  this  equality  amounts  to  prove  the 
claim,  since  VAl  ®  •  •  •  ®  has  dimension  n  and  therefore  coincides  with  V.  □ 

Corollary  9.4.2  If\  e  R  and  m{\)  =  1  for  any  i  =  1 ,  . . . ,  n,  then  is  simple. 

Proof  It  is  immediate,  by  recalling  the  Proposition 9.3.9  and  (d)  in  the  Theorem 
9.4.1.  □ 

Exercise  9.4.3  Let  f  be  the  endomorphism  in  M2  whose  corresponding  matrix  with 
respect  to  the  canonical  basis  is  the  matrix 


It  is  pa(T )  =  (1  —  T)2:  such  a  polynomial  has  only  one  root  A  =  1  with  alge¬ 
braic  multiplicity  m  =  2.  It  is  indeed  easy  to  compute  that  Vi  =  C((l,  0)),  so  the 
geometric  multiplicity  is  1 .  This  proves  that  the  matrix  A  is  not  diagonalisable,  the 
corresponding  endomorphism  is  not  simple. 

Proposition  9.4.4  Let  f  e  End(  V)  be  a  simple  endomorphism  and  C  be  a  basis  of 
V  such  that  A  =  M^,C.  Then, 

(a)  the  eigenvalues  A\, ...  ,AS  for  f,  counted  with  their  multiplicities  ra(Ai),  . . . , 
m(Xs),  are  the  diagonal  elements  for  A; 

(b)  the  diagonal  matrix  A  is  uniquely  determined  up  to  permutations  of  the  eigen¬ 
values  ( such  a  permutation  corresponds  to  a  permutation  in  the  ordering  of  the 
basis  elements  in  C ). 

Proof  (a)  From  the  Remark 9.2.1  we  know  that  the  diagonal  elements  in  A  = 

c  c 

M(J)  ’  e  Rn,n  are  given  by  the  eigenvalues  Ai ,  . . . ,  As :  each  eigenvalue  A/  must 
be  counted  as  many  times  as  the  geometric  multiplicity  of  the  eigenvector  i 
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since  C  is  a  basis  of  eigenvectors.  From  the  claim  (d)  in  the  Theorem  9.4.1,  the 
geometric  multiplicity  of  each  eigenvalue  coincides  with  its  algebraic  multiplic¬ 
ity. 

(a)  This  is  obvious.  □ 

Proposition  9.4.5  Let  fbe  a  simple  endomorphism  on  V ,  with  B  an  arbitrary  basis 

)3>  )3> 

of  V.  By  setting  A  =  M(f)  ’  ,  let  P  be  a  matrix  such  that 

P~l  AP  -  A. 

Then  the  columns  in  P  are  the  components,  with  respect  to  B,  of  a  basis  ofV  made 
by  eigenvectors  for  f. 

Proof  Let  C  be  a  basis  of  V  such  that  A  =  M^fC.  From  the  Remark 9.2.3  the  basis  C 

is  made  by  eigenvectors  for  <fi.  The  claim  follows  by  setting  P  =  MB,C ,  that  is  the 
matrix  of  the  change  of  basis.  □ 

Definition  9.4.6  Given  a  matrix  A  e  Rn,n ,  its  diagonalisation  consists  of  determin¬ 
ing,  (if  they  exist)  a  diagonal  matrix  A  ~  A  and  an  invertible  matrix  P  e  GL (n) 
such  that  P~l  A  P  =  A. 

The  following  remark  gives  a  resume  of  the  steps  needed  for  the  diagonalisation 
of  a  given  matrix. 

Remark  9.4.7  (An  algorithm  for  the  diagonalisation)  Let  A  e  M.n,n  be  a  square 
matrix.  In  order  to  diagonalise  it: 

(1)  Write  the  characteristic  polynomial  pa(T)  of  A  and  find  its  roots  Ai,  . . . ,  As 
with  the  corresponding  algebraic  multiplicities  m\,  . . . ,  ms. 

(2)  If  one  of  the  roots  A /  ^  R,  then  A  is  not  diagonalisable. 

(3)  If  A i  e  R  for  any  i  =  1,  . . . ,  s,  compute  the  geometric  multiplicities 

dim(VA.)  —  n  -  rk(A  -  A //„). 

If  there  is  an  eigenvalue  A/  such  that  m,  7^  dim(VA.),  then  A  is  not  diagonalis¬ 
able. 

(4)  if  A  t  e  R  and  ra(A);  =  dim(VA.)  for  any  i  =  1, ...  ,s,  then  A  is  diagonalisable. 
In  such  a  case,  A  is  similar  to  a  diagonal  matrix  A :  the  eigenvalues  A/ ,  counted 
with  their  multiplicities,  give  the  diagonal  elements  for  A. 

(5)  it  is  A  =  M0  ’  ,  where  B  is  a  basis  of  V  given  by  eigenvectors  for  the  endomor- 

phism  corresponding  to  the  matrix  A.  By  defining  P  =  it  is  A  =  P~lAP. 

Since  V  is  the  direct  sum  of  the  eigenspaces  for  A  (see  Theorem  9.4.1),  it  follows 
that  B  =  B\  U  •  •  •  U  Bs ,  with  Bi  a  basis  of  V\i  for  any  i  =  l, ...,  s.  (The  spaces 
V\j  can  be  obtained  explicitly  as  in  the  Lemma  9.2.15.) 
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Exercise  9.4.8  We  study  whether  the  matrix 


A  = 


is  diagonalisable.  Its  characteristic  polynomial  is 


Pa(T)  =  det(A  -  Th) 

3  -  T  1  1 

=  1  -T  2 

1  2  -T 

=  - T 3  +  3T2  +  6T  -  8  =  (T  -  1)<T  -  4 ){T  +  2). 


Its  eigenvalues  are  found  to  be  Ai  =  1,  A2  =  4,  A3  =  —2.  Since  each  root  of  the 
characteristic  polynomial  has  algebraic  multiplicity  m  =  1,  from  the  Corollary  9.4.2 
the  matrix  A  is  diagonalisable,  and  indeed  similar  to 


We  compute  a  basis  B  for  R3  of  eigenvectors  for  A.  We  know  that  V\  =  ker(A  —  I3), 
so  Vi  is  the  space  of  the  solutions  of  the  homogeneous  linear  system  (A  —  /3)X  =  0 
associated  to  the  matrix 


A  —  I3  =  I  i  — i  2 
VI  2  -1 


which  is  reduced  to 


The  solution  of  such  a  linear  system  are  given  by  (x,  y,  z)  =  (x,  —x,  —x),  thus 
y1  =  £((—1,1,1)).  Along  the  same  lines  we  compute 


V4  =  ker (A  -  4/3)  =  £((2,  1,  1)), 
V-2  =  ker  (A  +  2/3)  =  £((0, -1,  1)). 
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We  have  then  B  =  ((-1,  1,  1),  (2,  1,  1),  (0,  -1,  1))  and 

(-12  0\ 

P  =  M£'B  =  I  i  i  -1  I  . 

It  is  easy  to  compute  that  P~l  A  P  =  A. 

Proposition  9.4.9  Let  A  e  M.n,n  be  diagonalisable,  with  eigenvalues  Ai ,  . . . ,  As  and 
corresponding  multiplicities  m i,  . . . ,  ms.  Then 

det(A)  =  A71  •  A72 . A"\ 

tr(A)  =  m\X\  +  m2A2  H - Yms Xs. 

Proof  Since  A  is  diagonalisable,  there  exists  an  invertible  n -dimensional  matrix  P 
such  that  A  =  P~l  A  P .  The  matrix  A  is  diagonal,  and  its  diagonal  elements  are 
(see  the  Proposition  9.4.4)  the  eigenvalues  of  A  counted  with  their  multiplicities. 
Then,  from  the  Proposition  9. 1.5  on  has, 

det(A)  =  det(P-1AP)  =  det(A)  =  A"11  •  A^2 . X™s 


and 


tr(A)  =  tr(P  1 AP)  —  tr(A)  =  m\X\  +  m2\2  +  •  •  •  +  ms  \s. 


□ 


9.5  The  Jordan  Normal  Form 

In  this  section  we  briefly  describe  the  notion  of  Jordan  normal  form  of  a  matrix.  As  we 
have  described  before  in  this  chapter,  a  square  matrix  is  not  necessarily  diagonalis¬ 
able,  that  is  it  is  not  necessarily  similar  to  a  diagonal  matrix.  It  is  nonetheless  possible 
to  prove  that  any  square  matrix  is  similar  to  a  triangular  matrix  /  which  is  not  far 
from  being  diagonal.  Such  a  matrix  /  is  diagonal  if  and  only  if  A  is  diagonalisable; 
if  not  it  has  a  ‘standard’  block  structure. 

An  example  of  a  so  called  Jordan  block  is  the  non  diagonalisable  matrix  A  in 
Exercise  9.4.3.  We  denote  it  by 


M  l)  = 


l  l 
0  1 
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A  Jordan  block  of  order  k  is  a  k-dimensional  upper  triangular  square  matrix  of  the 
form 


/A  1  0  •  •  •  0\ 
0  A  1  ...  0 


4(A)  = 


\0  0  0  •••  A / 


where  the  diagonal  terms  are  given  by  a  scalar  A  e  R,  the  (Jk(\))jj+\  entries  are 
1  and  the  remaining  entries  are  zero.  It  is  immediate  to  show  that  the  characteristic 
polynomial  of  such  a  matrix  is  given  by 

PMX)(T)  =  (T  -  X)k, 

and  the  parameter  A  is  the  unique  eigenvalue  with  algebraic  multiplicity  rri\=  k. 
The  corresponding  eigenspace  is 

VA  =  ker(4(A)  -  X  In)  =  £((1,  0, . . . ,  0)), 

with  geometric  multiplicity  dim(VA)  =  1.  Thus,  if  k  >  1,  a  Jordan  block  is  not  diag- 
onalisable. 

A  matrix  J  is  said  to  be  in  ( canonical  or  normal)  Jordan  form  if  it  has  a  block 
diagonal  form 

Jh  (Ai)  0  ...  0 

(A2)  •  •  •  0 

’ 

.  •  . 

0  ...Jks(Xs)J 

where  each  Jkj  (A j)  is  a  Jordan  block  of  order  kj  and  eigenvalue  Xj,  for  j  =  1 , . . . ,  s. 

Notice  that  nothing  prevents  from  having  the  same  eigenvalue  in  different  Jordan 
blocks,  that  is  A  j  =  A/  even  with  kj  7^  &/.  Since  each  Jordan  block  Jk  (Xj)  provides 
a  one  dimensional  eigenspace  for  Xj ,  the  geometric  multiplicity  of  Xj  coincides  with 
the  number  of  Jordan  blocks  with  eigenvalue  Ay.  The  algebraic  multiplicity  of  Xj 
coincides  indeed  with  the  sum  of  the  orders  of  the  Jordan  blocks  having  the  same 
eigenvalue  Xj . 

Theorem  9.5.1  (Jordan)  Let  A  e  such  that  its  characteristic  polynomial  has 
only  real  roots  ( such  roots  are  all  the  eigenvalues  for  A).  Then, 

( i )  the  matrix  A  is  similar  to  a  Jordan  matrix, 

(ii)  two  Jordan  matrices  J  and  J '  are  similar  if  and  only  if  one  is  mapped  into  the 
other  under  a  block  permutation. 
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We  omit  a  complete  proof  of  this  theorem,  and  we  limit  ourselves  to  briefly 
introduce  the  notion  of  generalised  eigenvector  of  a  matrix  A.  We  recall  that,  when 
A  is  not  diagonalisable,  the  set  of  eigenvectors  for  A  is  not  enough  for  a  basis  of  M77 . 
The  columns  of  the  invertible  matrix  P  that  realises  the  similarity  between  A  and 
the  Jordan  form  /  (such  that  P~l  A P  =  J)  are  the  components  with  respect  to  the 
canonical  basis  £n  of  the  so  called  generalised  eigenvectors  for  A. 

Given  an  eigenvalue  A  for  A  with  algebraic  multiplicity  m\  >  1,  a  corresponding 
generalised  eigenvector  is  a  non  zero  vector  v  that  solves  the  linear  homogeneous 
system 

(A  -  A In)mv  =  0R». 

It  is  possible  to  show  that  such  a  system  has  m  solutions  Vj  (with  Vj  =  1 ,  ,m) 

which  can  be  obtained  by  recursion, 

(A  —  XIn)v\  =  Orh, 

(A  -  A In)vk  =  vk-i,  k  =  2 

The  elements  Vj  span  the  generalised  eigenspace  V\  for  A  corresponding  to  the 
eigenvalue  A.  The  generalised  eigenvectors  satisfy  the  condition 

(A  —  A  In)kVk  =  0 for  any  k  =  1,  2,  ...  m. 

Since  the  characteristic  polynomial  of  A  has  in  general  complex  roots,  we  end  by 
noticing  that  a  more  natural  version  of  the  Jordan  theorem  is  valid  on  C. 

Exercise  9.5.2  We  consider  the  matrix 

/  5  4  2  1  \ 

01-1-1 
A  “  -1-13  0  ' 

\1  1-12/ 

Its  characteristic  polynomial  is  computed  to  b z  pa(T)  =  (T  —  1)(T  —  2)(T  —  4)2, 
so  its  eigenvalues  are  A  =  1,  2,  4,  4.  Since  the  algebraic  multiplicity  of  the  eigen¬ 
values  A  =  1  and  A  =  2  is  1,  their  geometric  multiplicity  is  also  1.  An  explicit 
computation  shows  that 

dim(ker(A  —  4  I 4))  =  1. 

We  have  then  that  A  is  not  diagonalisable,  and  that  the  eigenvalue  A  =  4  corresponds 
to  a  Jordan  block.  A  canonical  form  for  the  matrix  A  is  then  given  by 

/I  0  0  0\ 

0  2  0  0 
0  04  1  ' 

\0  0  0  4/ 


J  = 
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Exercise  9.5.3  The  matrices 


10  0^ 

/3  1  0  0\ 

0  3  0  0 

f  = 

0  3  0  0 

0  0  3  0 

0  0  3  1 

\0  0  0  3 ) 

\0  0  0  3/ 

have  the  same  characteristic  polynomial,  the  same  determinant,  and  the  same  trace. 
They  are  however  not  similar,  since  they  are  in  Jordan  form,  and  there  is  no  block 
permutation  under  which  /  is  mapped  into  J' . 


Chapter  10 

Spectral  Theorems  on  Euclidean  Spaces 


® 

Check  for 
updates 


In  Chap.  7  we  studied  the  operation  of  changing  a  basis  for  a  real  vector  space.  In 
particular,  in  the  Theorem 7.9.6  and  the  Remark 7.9.7  there,  we  showed  that  any 
matrix  giving  a  change  of  basis  for  the  vector  space  W  is  an  invertible  n  x  n  matrix, 
and  noticed  that  any  n  x  n  invertible  yields  a  change  of  basis  for  W1 . 

In  this  chapter  we  shall  consider  the  endomorphisms  of  the  euclidean  space 
En  =  (M7\  •),  where  the  symbol  •  denotes  the  euclidean  scalar  product,  that  we 
have  described  in  Chap.  3. 


10.1  Orthogonal  Matrices  and  Isometries 

As  we  noticed,  the  natural  notion  of  basis  for  a  euclidean  space  is  that  of  orthonormal 
one.  This  restricts  the  focus  to  matrices  which  gives  a  change  of  basis  between 
orthonormal  bases  for  En . 

Definition  10.1.1  A  square  matrix  A  e  is  called  orthogonal  if  its  columns 
form  an  orthonormal  basis  B  for  En .  In  such  a  case  A  =  that  is  A  is  the 

matrix  giving  the  change  of  basis  from  the  canonical  basis  S  to  the  basis  B. 

It  follow  from  this  definition  that  an  orthogonal  matrix  is  invertible. 

Exercise  10.1.2  The  identity  matrix  In  is  clearly  orthogonal  for  each  En.  Since  the 
vectors 


.1  =  ^(1,!),  «2=i(l.-l) 
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form  an  orthonormal  basis  for  E 2,  the  matrix 


is  orthogonal. 

Proposition  10.1.3  A  matrix  A  is  orthogonal  if  and  only  if 

fA  A  =  In, 


that  is  if  and  only  if  A  1  =  T A. 

Proof  With  (v\,  . . . ,  vn)  a  system  of  vectors  in  En ,  we  denote  by  A  =  (tq  •  •  •  vn) 
the  matrix  with  columns  given  by  the  given  vectors,  and  by 


its  transpose.  We  have  the  following  equivalences.  The  matrix  A  is  orthogonal  (by 
definition)  if  and  only  if  (tq,  . . . ,  vn)  is  an  orthonormal  basis  for  En ,  that  is  if  and 
only  if  Vi  •  Vj  =  Stj  for  any  i,  j .  Recalling  the  representation  of  the  row  by  column 
product  of  matrices,  one  has  Vi  •  Vj  =  Stj  if  and  only  if  (tAA)ij  =  Sij  for  any  i,  j, 
which  amounts  to  say  that  rAA  =  In.  □ 

Exercise  10.1.4  For  the  matrix  A  considered  in  the  Exercise  10.1.2  one  has  easily 
compute  that  A  =  fA  and  A2  =  /2. 

Exercise  10.1.5  The  matrix 


is  not  orthogonal,  since 


7^  h- 


Proposition  10.1.6  If  A  is  orthogonal,  then  det(A)  =  ±1. 

Proof  This  statement  easily  follows  from  the  Binet  Theorem  5.1.16:  with  rAA  =  In , 
one  has 

det(rA)  det(A)  =  det  (/„)  =  1, 


and  the  Corollary  5.1.12,  that  is  detfA)  =  det(A),  which  then  implies 
(det(A))2  =  1.  □ 
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Remark  10.1.7  The  converse  to  this  statement  does  not  hold.  The  matrix  A  from  the 
Exercise  10.1.5  is  not  orthogonal,  while  det(A)  =  1. 

Definition  10.1.8  An  orthogonal  matrix  A  with  det(A)  =  1  is  called  special  orthog¬ 
onal. 

Proposition  10.1.9  The  set  O (n)  of  orthogonal  matrices  in  M.n,n  is  a  group,  with 
respect  to  the  usual  matrix  product.  Its  subset  SO(n)  =  {A  e  O  (n)  :  det(A)  =  1} 
is  a  subgroup  ofO(n )  with  respect  to  the  same  product. 

Proof  We  prove  that  0(A)  is  stable  under  the  matrix  product,  has  an  identity  element, 
and  the  inverse  of  an  orthogonal  matrix  is  orthogonal  as  well. 

•  The  identity  matrix  In  is  orthogonal,  as  we  already  mentioned. 

•  If  A  and  B  are  orthogonal,  then  we  can  write 

t(AB)AB  =  TBfAAB 
—  tBlnB 

=  'BB  =  /„, 


that  is,  A  B  is  orthogonal. 

•  If  A  is  orthogonal,  rAA  =  /„,  then 

'(A-')A-1  =  (A  rA)~l  =  /„, 

that  proves  that  A-1  is  orthogonal. 

From  the  Binet  theorem  it  easily  follows  that  the  set  of  special  orthogonal  matrices 
is  stable  under  the  product,  and  the  inverse  of  a  special  orthogonal  matrix  is  special 
orthogonal.  □ 

Definition  10.1.10  The  group  0(A)  is  called  the  orthogonal  group  of  order  n ,  its 
subset  SO  (A)  is  called  the  special  orthogonal  group  of  order  n. 

We  know  from  the  Definition  10.1.1  that  a  matrix  is  orthogonal  if  and  only  if 
it  is  the  matrix  of  the  change  of  basis  between  the  canonical  basis  £  (which  is 
orthonormal)  and  a  second  orthonormal  basis  J3.  A  matrix  A  is  then  orthogonal  if 
and  only  if  A-1  =  T A  (Proposition  10.1.3). 

The  next  theorem  shows  that  we  do  not  need  the  canonical  basis.  If  one  defines 
a  matrix  A  to  be  orthogonal  by  the  condition  A-1  =  rA ,  then  A  is  the  matrix  for 
a  change  between  two  orthonormal  bases  and  viceversa,  any  matrix  A  giving  the 
change  between  orthonormal  bases  satisfies  the  condition  A-1  =  TA. 

Theorem  10.1.11  Let  C  be  an  orthonormal  basis  for  the  euclidean  vector  space  En, 
with  B  another  (arbitrary)  basis  for  it.  The  matrix  MC,B  of  the  change  of  basis  from 
C  to  B  is  orthogonal  if  and  only  if  also  the  basis  B  is  orthonormal. 
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Proof  We  start  by  noticing  that,  since  C  is  an  orthonormal  basis,  the  matrix  M8,c 
giving  the  change  of  basis  between  the  canonical  basis  £  and  C  is  orthogonal  by  the 
Definition  10.1.1.  It  follows  that,  being  0(n)  a  group,  the  inverse  Mc,s  =  (M8,c)~l 
is  orthogonal.  With  B  an  arbitrary  basis,  from  the  Theorem  7.9.9  we  can  write 

Mc,b  =  MC'8  M8'8  M8-8 

=  Mc'8  In  M8,8  =  M8,8  M8,8 . 


Firstly,  let  us  assume  B  to  be  orthonormal.  We  have  then  that  M 8,8  is  orthogonal; 
thus  M8 ,8  is  orthogonal  since  it  is  the  product  of  orthogonal  matrices. 

Next,  let  us  assume  that  M8,8  is  orthogonal;  from  the  chain  relations  displayed 
above  we  have 

M8'8  =  (Mc,8)~l  MC'8  =  M8,CM8,8. 

This  matrix  M8,  8  is  then  orthogonal  (being  the  product  of  orthogonal  matrices),  and 
therefore  B  is  an  orthonormal  basis.  □ 


We  pass  to  endomorphisms  corresponding  to  orthogonal  matrices.  We  start  by 
recalling,  from  the  Definition 3. 1.4,  that  a  scalar  product  has  a  ‘canonical’  form 
when  it  is  given  with  respect  to  orthonormal  bases. 

Remark  10.1.12  Let  C  be  an  orthonormal  basis  for  the  euclidean  space  En .  If 
v,w  €  En  are  given  by  v  =  (x\,  ... ,  xn)c  and  w  =  (y\,  ... ,  yn)c ,  one  has  that 
v  •  w  =  x\y\  +  •  •  •  +  xnyn.  By  denoting  X  and  Y  the  one-column  matrices  whose 
entries  are  the  components  of  v,w  with  respect  to  C,  that  is 


we  can  write 


f  *  i  \ 

\%n  J 


\yj 


V  ■  w  =  X|  V|  H - 1-  xnyn  =  (x'l  . . .  xn) 


^>’1  \ 

\yj 


Theorem  10.1.13  Let  f  e  End(En),  with  £  the  canonical  basis  of  En.  The  follow¬ 
ing  statements  are  equivalent: 

(i)  The  matrix  A  =  M8,8  is  orthogonal. 

(ii)  It  holds  that  •  4>(w)  =  v  •  w  for  any  v,w  e  En. 

(iii)  IfB  =  (b\,  . . . ,  bn)  is  an  orthonormal  basis  for  En,  then  the  set  B'  =  (cj)(b  i),  . . . , 
f(bn))  is  such. 
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Proof  (i)  =>  (ii):  by  denoting  X  =  Tv  and  Y  =  tw  we  can  write 

v  •  w  =  xXY ,  f(v)  •  0(w)  =  \AX)(AY)  =  tX(tAA)Y , 

and  since  A  is  orthogonal,  74  A  =  4,  we  conclude  that  0(u)  •  </>(w)  =  u  •  w  for 
any  v,w  e  En . 

(ii)  =>►  (iii):  let  A  =  M0’  be  the  matrix  of  the  endomorphism  f  with  respect  to  the 
basis  C.  We  start  by  proving  that  A  is  invertible.  By  adopting  the  notation  used 
above,  we  can  represent  the  condition  f(v)  •  f(w)  =  v  •  w  as  { (AX)  (AY)  = 
lXY  for  any  X,  Y  e  En .  It  follows  that  TAA  =  In,  that  is  A  is  orthogonal,  and 
then  invertible.  This  means  (see  Theorem 7.8.4)  that  f  is  an  isomorphism,  so  it 
maps  a  basis  for  En  into  a  basis  for  En .  If  B  is  an  orthonormal  basis,  then  we 
can  write 

f(bi)  •  (j)(bj)  =  bi  •  bj  =  Sij 

which  proves  that  B'  is  an  orthonormal  basis. 

(iii)  =>-  (i):  since  £,  the  canonical  basis  for  En ,  is  orthonormal,  then  (f(ei),  . . . , 
4>(en))  is  orthonormal.  Recall  the  Remark7.1.10:  the  components  with  respect 

c  c  c  c 

to  £  of  the  elements  </>(£; )  are  the  column  vectors  of  the  matrix  ’  ,  thus  M(j)  ’ 
is  orthogonal.  □ 

We  have  seen  that,  if  the  action  of  <fi  e  End (En)  is  represented  with  respect  to  the 
canonical  basis  by  an  orthogonal  matrix,  then  f  is  an  isomorphism  and  preserves  the 
scalar  product,  that  is,  for  any  v,w  e  En  one  has  that, 


v  •  w  =  <j)(v)  •  f(w). 

The  next  result  is  therefore  evident. 

Corollary  10.1.14  Iff  e  End (En)  is  an  endomorphism  of  the  euclidean  space  En 
whose  corresponding  matrix  with  respect  to  the  canonical  basis  is  orthogonal  then 
f  preserves  the  norms,  that  is,  for  any  v  e  En  one  has 

ii0(u)ii  =  ii^ii- 

This  is  the  reason  why  such  an  endomorphism  is  also  called  an  isometry. 

The  analysis  we  developed  so  far  allows  us  to  introduce  the  following  definition, 
which  will  be  more  extensively  scrutinised  when  dealing  with  rotations  maps. 

Definition  10.1.15  If  f  e  End(£")  takes  the  orthonormal  basis  B  =  (b\, . . . ,  bn) 
to  the  orthonormal  basis  B'  =  (b\  =  f(b\),  . . . ,  b'n  =  f(bn))  in  En,  we  say  that  B 
and  B ’  have  the  same  orientation  if  the  matrix  representing  the  endomorphism  f  is 
special  orthogonal. 
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Remark  10.1.16  It  is  evident  that  this  definition  provides  an  equivalence  relation 
within  the  collection  of  all  orthonormal  bases  for  En .  The  corresponding  quotient 
can  be  labelled  by  the  values  of  the  determinant  of  the  orthogonal  map  giving  the 
change  of  basis,  that  is  det  <f  =  {±1}.  This  is  usually  referred  to  by  saying  that  the 
euclidean  space  En  has  two  orientations. 


10.2  Self-adjoint  Endomorphisms 

We  need  to  introduce  an  important  class  of  endomorphisms. 

Definition  10.2.1  An  endomorphism  <f  of  the  euclidean  vector  space  En  is  called 
self-adjoint  if 

cj)(y)  •  w  =  v  •  <f(w)  V  v,  w  e  E . 

From  the  Proposition  9.2. 11  we  know  that  eigenvectors  corresponding  to  dis¬ 
tinct  eigenvalues  are  linearly  independent.  When  dealing  with  self-adjoint  endomor¬ 
phisms,  a  stronger  property  holds. 

Proposition  10.2.2  Let  f  be  a  self-adjoint  endomorphism  of  En,  with  Ai,  A2  6  R 
different  eigenvalues  for  it.  Any  two  corresponding  eigenvectors,  0  7^  v\  e  V\xand 
0  7^  n2  E  V\2>  are  orthogonal. 

Proof  Since  f  is  self-adjoint,  one  has  <ft(v\)  •  V2  =  v\  •  f(v2)  while,  v\  and  V2  being 
eigenvectors,  one  has  fix ;*)  =  \  vi  for  i  =  1,2.  We  can  then  write 


(Aiiq)  •  v2  =  Vi  •  (X2V2) 


which  reads 


Ai(ui  •  V2)  =  A2(ni  •  v2)  (A2  -  Ai)(ui  •  v2)  =  0. 

The  assumption  that  the  eigenvalues  are  different  allows  one  to  conclude  that 
v\  •  v2  =  0,  that  is  v\  is  orthogonal  to  v2.  □ 

The  self-adjointness  of  an  endomorphism  can  be  characterised  in  terms  of  proper¬ 
ties  of  the  matrices  representing  its  action  on  En .  We  recall  from  the  Definition  4. 1.21 
that  a  matrix  A  =  ( aij )  e  is  called  symmetric  if  TA  =  A,  that  is  if  one  has 
atj  =  ap ,  for  any  i,  j. 

Theorem  10.2.3  Let  <j>  e  End  (En)  and  B  an  orthonormal  basis  for  En.  The  endo- 

13  13 

morphism  <j>  is  self-adjoint  if  and  only  if  ’  is  symmetric. 
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Proof  Using  the  usual  notation,  we  set  A  =  (ciij )  =  M0  ’  and  X,  Y  be  the  columns 
giving  the  components  with  respect  to  J3  of  the  vectors  v,w  in  En .  From  the 
Remark  10. 1 . 1 2  we  write 

00)  •  w  =  t(AX)Y  =  C XtA)Y  =  lXrAY 
and  v  •  0(w)  =  'X(AF)  =  OAF. 

Let  us  assume  A  to  be  symmetric.  From  the  relations  above  we  conclude  that  00)  • 
w  =  v  •  0(w)  for  any  v,w  e  En ,  that  is  f  is  self-adjoint. 

If  we  assume  f  to  be  self-adjoint,  then  we  can  equate 

?X'AF  =  lXAY 


for  any  X ,  F  in  R” .  If  we  let  X  and  F  to  range  on  the  elements  of  the  canonical  basis 
£  =  (e\,  ...  ,en)  in  M77,  such  a  condition  is  just  the  fact  that  atj  =  aji  for  any  i,  j, 
that  is  A  is  symmetric.  □ 

Exercise  10.2.4  The  following  matrix  is  symmetric: 


Then  the  endomorphism  f  e  End  ( is 2 )  corresponding  to  A  with  respect  to  the 
canonical  basis  is  self-adjoint.  This  can  also  be  shown  by  a  direct  calculation: 
</>((v,  y))  =  (2x  -y,-x  +  3 y);  then 


(a,  b)  •  0(0,  y))  =  a(2x  -  y)  +  b(-x  +  3 y) 

=  (2  a  —  b)x  +  (— a  +  3  b)y 

=  0(0,/?))  •  O,  y). 


Exercise  10.2.5  The  following  matrix  is  not  symmetric 


The  corresponding  (with  respect  to  the  canonical  basis)  endomorphism  0  e  End(E2) 
is  indeed  not  self-adjoint  since  for  instance, 


0(*i)-*2  =  (1,-1)  00,1)  =  -1, 

ei  •  0O2)  =  (1,0)01,0)  =  1. 


An  important  family  of  self-adjoint  endomorphisms  is  illustrated  in  the  following 


exercise. 
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Exercise  10.2.6  We  know  from  Sect.  8.2  that,  if  B  =  (e\ , . . . ,  en)  is  an  orthonormal 

basis  for  En ,  then  the  action  of  an  endomorphism  <j>  whose  associated  matrix  is 

13  13 

O  =  M0  ’  can  be  written  with  the  Dirac’s  notation  as 


n 

4  =  ^  ^ ab\^a) 

a,b—  1 

with  <&ab  =  (ea\ 4>{eb)).  Then,  the  endomorphism <j>  is  self-adjoint  if  and  only  if 
<&ab  =  <&ba.  Consider  vectors  u  =  (u i,  . . . ,  un)s,  v  =  (v\, . . . ,  vn)s  in  En ,  and 
define  the  operator  L  =  \u)(v\.  We  have 


(ea\Eeh)  =  (ea\u)(v\eb)  =  uavb , 

(eb\ Lea)  =  {eb\u){v\ea)  =  ubva , 

so  we  conclude  that  the  operator  L  =  \u)(v\  is  self-adjoint  if  and  only  if  u  =  v. 

Exercise  10.2.7  Let  0  be  a  self-adjoint  endomorphism  of  the  euclidean  space  En , 
and  let  the  basis  3  =  (e  u  ...  ,en)  made  of  orthonormal  eigenvectors  for  </>  with  corre¬ 
sponding  eigenvalues  (Ai,  . . . ,  Xn)  (not  necessarily  all  distinct).  A  direct  computation 
shows  that,  in  the  Dirac’s  notation,  the  action  of  4>  can  be  written  as 

cj)  =  X\\e\){e\  \  +  •  •  •  +  An \en)(en\, 

so  that,  for  any  v  e  En ,  one  writes 

4>(v)  =  Ai|^i)(^i|u)  +  •  •  •  +  \n\en)  (en\v) . 


10.3  Orthogonal  Projections 

As  we  saw  in  Chap.  3,  given  any  vector  subspace  W  C  En,  with  orthogonal  com¬ 
plement  WL  we  have  a  direct  sum  decomposition  En  =  W  0  W±,  so  for  any  vector 
v  e  En  we  have  (see  the  Proposition  3.2.5)  a  unique  decomposition  v  =  vw  +  vw^- 
This  suggests  the  following  definition. 

Definition  10.3.1  Given  the  (canonical)  euclidean  space  En  with  W  C  En  a  vector 
subspace  and  the  orthogonal  sum  decomposition  v  =  vw  +  vw j-,  the  map 

Pw  •  En  — >  En ,  v  Uw 

is  linear,  and  it  is  called  the  orthogonal  projection  onto  the  subspace  W .  The  dimen¬ 
sion  of  W  is  called  the  rank  of  the  orthogonal  projection  Pw. 

If  W  C  En  it  is  easy  to  see  that  Im(TV)  =  W  while  ker(ZV)  =  W1.  Moreover, 
since  Pw  acts  as  an  identity  operator  on  its  range  W,  one  also  has  ^2  =  pw  If 
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u,  v  are  vectors  in  En ,  with  orthogonal  sum  decomposition  u  =  uw  +  uw±  and  v  = 
vw  +  Vw1-  s  we  can  explicitly  compute 

Pwiu)  •  n  =  uw  •  (i>w  +  i^-l) 

=  U\y  ’  V\v  and 

u  •  Pwiv)  —  (Mw  “I-  Uw1)  ■ 

=  •  Vw  • 


This  shows  that  orthogonal  projectors  are  self-adjoint  endomorphisms.  To  which 
extent  can  one  reverse  these  computations,  that  is  can  one  characterise,  within  all 
self-adjoint  endomorphisms,  the  collection  of  orthogonal  projectors?  This  is  the 
content  of  the  next  proposition. 

Proposition  10.3.2  Given  the  euclidean  vector  space  En,  an  endomorphism 
p  g  End (En)  is  an  orthogonal  projection  if  and  only  if  it  is  self-adjoint  and  sat¬ 
isfies  the  condition  p2  —  p. 

Proof  We  have  already  shown  that  the  conditions  are  necessary  for  an  endomorphism 
to  be  an  orthogonal  projection  in  En .  Let  us  now  assume  that  p  is  a  self-adjoint 
endomorphism  fulfilling  p2  =  p.  For  any  choice  of  u,  v  g  En  we  have 

((1  -  p){u))  •  p(v)  =  u  •  p(v)  -  p(u)  •  p(v) 

=  u  •  p(v)  —  u  •  p  (v) 

—  u  •  p(v)  —  u  •  p(v)  =  0 

with  the  second  line  coming  from  the  self-adjointness  of  p  and  the  third  line  from 
the  condition  p2  =  p.  This  shows  that  the  vector  subspace  Im(l  —  p)  is  orthogo¬ 
nal  to  the  vector  subspace  Im (p).  We  can  then  decompose  any  vector  y  g  En  as 
an  orthogonal  sum  y  =  yim(i_0)  +  yim0  +  £,  where  £  is  an  element  in  the  vector 
subspace  orthogonal  to  the  sum  Im(l  —  p)  ®  Im (p).  For  any  u  e  En  and  any  such 
vector  £  we  have 


P(u)^=  0,  ((1  -P)(u))^=  0. 

These  conditions  give  that  u  •  £  =  0  for  any  u  G  En,  so  we  can  conclude  that  £  =  0. 
Thus  we  have  the  orthogonal  vector  space  decomposition 

En  =  Im(l  —  p)  0  Im (p). 

We  show  next  that  ker (p)  =  Im(l  —  p).  If  u  G  Im(l  —  p),  we  have  u  =  (1  —  p)v 
with  ugF,  thus  p(u)  =  p(l  —  p)v  =  0,  that  is  Im(l  —  p)  c  ker (p).  Conversely, 
if  u  G  ker (</>),  then  p(u)  •  v  =  0  for  any  v  G  En,  and  u  •  p(v)  =  0,  since  p  is  self- 
adjoint,  which  gives  ker((/>)  c  (Im (p))^  and  ker((/>)  c  Im(l  —  p),  from  the  decom¬ 
position  of  En  above. 
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If  w  g  Im(</>),thenw  =  cj)(x)  foragivenv  g  En,  thus  </>(w)  =  (j)2(x)  —  (j>(x)  =  w. 
We  have  shown  that  we  can  identify  (j)  =  Pim^)  •  This  concludes  the  proof.  □ 

Exercise  10.3.3  Consider  the  three  dimensional  euclidean  space  E 3  with  canon¬ 
ical  basis  and  take  W  =  C((  1,1,1)).  Its  orthogonal  subspace  is  given  by  the 
vectors  (x,  y,  z)  whose  components  solve  the  linear  equation  £:v  +  y  +  z  =  0, 
so  we  get  Sz  =  WL  =  £((  1,  —1,0),  (1,0,  —1)).  The  vectors  of  the  canonical 
basis  when  expressed  with  respect  to  the  vectors  u\  =  (1,  1,  1)  spanning  W  and 
U2  =  (1,  —  1,  0),  M3  =  (1,  0,  —1)  spanning  WL,  are  written  as 


e\ 


£3 


1 

3 

1 

3 

1 

3 


( U\  +  U2  +  M3), 

(u  1  -  2u2  +  m3), 
(u\  +  U2  —  2^3). 


Therefore, 


1 

Pw(e  1)  =  -  Mi, 


1 

Pw(Z2)  =  ~  Mi, 


1 

=  -  Mi, 


and 


—  —  (^2  +  M3),  2)  —  -  (~ 2^2  +  M3),  Pyy_L(e3)  —  -  (i/2  —  2^3). 

Remark  10.3.4  Given  an  orthogonal  space  decomposition  =  f  ©  the  union 
of  the  basis  and  of  W  and  is  a  basis  for  W .  It  is  easy  to  see  that  the 
matrix  associated  to  the  orthogonal  projection  operator  Pw  with  respect  to  such  a 
basis  B  has  a  block  diagonal  structure 


M 


B,B 

Pw 


0  ...  1  0  ...  0 
0  ...  00  ...  0 


where  the  order  of  the  diagonal  identity  block  is  the  dimension  of  W  =  Im( Pw) .  This 
makes  it  evident  that  an  orthogonal  projection  operator  is  diagonalisable:  its  spectrum 
contains  the  real  eigenvalue  A  =  1  with  multiplicity  equal  to  m\=\  =  dim(W)  and 
the  real  eigenvalue  A  =  0  with  multiplicity  equal  to  m\=o  =  dim(W±). 

It  is  clear  that  the  rank  of  Pw  (the  dimension  of  W)  is  given  by  the  trace  tr (MPw  ) 
irrespectively  of  the  basis  chosen  to  represent  the  projection  (see  the  Proposi- 
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tion9.1.5)  since  as  usual,  for  a  change  of  basis  with  matrix  MB,C ,  one  has  that 
Mp’C  =  MC’BMf'BMB'C,  with  MB’e  =  (Mc-B)~l. 

“w  r  iy  ’  V  7 

Exercise  10.3.5  The  matrix 

, „  (  a  V a  —  a2\ 

M  =  I - T 

\V a  —  a1  1  —  a  ) 

is  symmetric  and  satisfies  M2  =  M  for  any  a  e  (0,  1].  With  respect  to  an  orthonormal 
basis  (e\,  e2)  for  E 2,  it  is  then  associated  to  an  orthogonal  projection  with  rank  given 
by  tr(M)  =  1.  In  order  to  determine  its  range,  we  diagonalise  M.  Its  characteristic 
polynomial  is 

Pm(T )  =  det (M  -  TI2)  =  T2  —  T 


and  the  eigenvalues  are  then  A  =  0  and  A  =  1.  Since  they  are  both  simple,  the 
matrix  M  is  diagonalisable.  The  eigenspace  V\=\  corresponding  to  the  range  of  the 
orthogonal  projection  is  one  dimensional  and  given  as  the  solution  (x,  y )  of  the 
system 


a  —  1 
V a  —  a 2 


that  is  (x,  y  ^f~x)  whh  x  e  This  means  that  the  range  of  the  projection  is  given 

by  mufi?)). 

We  leave  as  an  exercise  to  show  that  M  is  the  most  general  rank  1  orthogonal 
projection  in  E2. 

Exercise  10.3.6  We  know  from  Exercise  10.2.6  that  the  operator  E  =  \u)  {u  \  is  self- 
adjoint.  We  compute 

E  =  \u)(u\u)(u\  =  \\u\\  L. 

Thus  such  an  operator  L  is  an  orthogonal  projection  if  and  only  if  ||  u  ||  =  1 .  It  is  then 
the  rank  one  orthogonal  projection  L  =  \u)(u\  =  Pc{u )• 

Let  us  assume  that  W\  and  W2  are  two  orthogonal  subspaces  (to  be  definite  we 
take  W2  c  Wj1).  By  using  for  instance  the  Remark  10.3.4  it  is  not  difficult  to  show 
that  Pw  1  PWl  =  Pw 2  Pw 1  =  0.  As  a  consequence, 

(Pwi  +  Pw2)(PWl  +  Pw2)  =  ?Wi  +  P\V2  T  Pw,  Pw2  +  Pw2Pw,  =  (Pw,  +  Pw2). 

Since  the  sum  of  two  self-adjoint  endomorphisms  is  self-adjoint,  we  can  conclude 
(from  Proposition  10.3.2)  that  the  sum  PWl  +  PWl  is  an  orthogonal  projector,  with 
Pw  1  +  Pw 2  =  PWl®W2.  This  means  that  with  two  orthogonal  subspaces,  the  sum  of 
the  corresponding  orthogonal  projectors  is  the  orthogonal  projection  onto  the  direct 
sum  of  the  given  subspaces. 
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These  results  can  be  extended.  If  the  euclidean  space  has  a  finer  orthogonal 

decomposition,  that  is  there  are  mutually  orthogonal  subspaces  {Wa}a= 1 . k  with 

En  =  W\  ®  •  •  •  0  Wfc,  then  we  have  a  corresponding  set  of  orthogonal  projectors 
Pwa  •  We  omit  the  proof  of  the  following  proposition,  which  we  shall  use  later  on  in 
the  chapter. 

Proposition  10.3.7  If  En  =  W\  ®  •  •  •  ®  Wk  with  mutually  orthogonal  subspaces 
Wa,  a  =  l,  ...  ,k,  then  the  following  hold: 

(a)  For  any  a,  b  =  l,  ...  ,k,  one  has 

Pwa  PWb  =  3 ab  Pw a  • 

(b)  IfW  =  Wai  0  •  •  •  0  Wcls  is  the  vector  subspace  given  by  the  direct  sum  of  the 
orthogonal  subspaces  {Wa  }  with  aj  any  subset  of  (1,  . . . ,  k)  without  repeti- 
tion,  then  the  sum  P  =  Pwa  +  . . .  +  Pwas  Is  the  orthogonal  projection  operator 

P  =  Pw ■ 

(c)  For  any  v  e  En,  one  has 


V  —  ( Pwi  +  •  •  •  +  Pwk)(v)- 


Notice  that  point  (c)  shows  that  the  identity  operator  acting  on  En  can  be  decom¬ 
posed  as  the  sum  of  all  the  orthogonal  projectors  corresponding  to  any  orthogonal 
subspace  decomposition  of  En . 

Remark  10.3.8  All  we  have  described  for  the  euclidean  space  En  can  be  natu¬ 
rally  extended  to  the  hermitian  space  ( Cn ,  •)  introduced  in  Sect.  3.4.  If  for  example 
(e\,  . . . ,  en)  gives  a  hermitian  orthonormal  basis  for  Hn ,  the  orthogonal  projection 
onto  Wa  =  C(ea)  can  be  written  in  the  Dirac’s  notation  (see  the  Exercise  10.3.6)  as 

Pwa  ~ 

while  the  orthogonal  projection  onto  W  =  Wai  0  •  •  •  0  Wa$  (point  b)  of  the  Propo¬ 
sition  10.3.7)  as 

Pw  =  \eai)(eai  \  H - +  \eas){eas\. 

The  decomposition  of  the  identity  operator  can  be  now  written  as 

id Hn  —  \e\)(ex\  H - +  \en)(en\. 


Thus,  any  vector  v  e  Hn  can  be  decomposed  as 
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10.4  The  Diagonalization  of  Self-adjoint  Endomorphisms 

The  following  theorem  is  a  central  result  for  the  diagonalization  of  real  symmetric 
matrices. 

Theorem  10.4.1  Let  A  e  be  symmetric ,  rA  =  A.  Then,  any  root  of  its  charac¬ 
teristic  polynomial  pa(T)  is  real. 

Proof  Let  us  assume  A  to  be  a  root  of  Pa(T).  Since  Pa(T)  has  real  coefficients, 
its  roots  are  in  general  complex  (see  the  fundamental  theorem  of  algebra,  Theo¬ 
rem  A. 5. 7).  We  therefore  think  of  A  as  the  matrix  associate  to  an  endomorphism 

f  :  Cn  — >  Cn , 

c  c 

with  M0  ’  =  A  with  respect  to  the  canonical  basis  £  for  Cn  as  a  complex  vector 
space.  Let  v  be  a  non  zero  eigenvector  for  that  is 


4>(v)  =  Xv. 

By  denoting  with  X  the  column  of  the  components  of  v  =  (x\ ,  . . . ,  xn)  with  respect 
to  £,  we  write 

W  =  \xu...,xn),  AX  =  AX. 

Under  complex  conjugation,  with  A  =  A  since  A  has  real  entries,  we  get 

W  =  *(jci,  ...,  xn),  AX  =  AX. 

From  these  relations  we  can  write  the  scalar  TXAX  in  the  following  two  ways, 

Wax  =  W(ax)  =  fx  (Ax)  =  A(Wx) 
and  WAX  =  (WA)X  =  t(AX)X  =  t(XX)X  =  x  (Wx). 

By  equating  them,  we  have 

(A  -  A)  (WX)  =  0. 

The  quantity  Wx  =  x\x\  X2X2  +  •  •  •  +  xnxn  is  a  positive  real  number,  since 
v  fz  0c» ;  we  can  then  conclude  A  =  A,  that  is  A  e  R.  □ 

Example  10.4.2  The  aim  of  this  example  is  threefold,  namely 

•  it  provides  an  ad  hoc  proof  of  the  Theorem  10.4.1  for  symmetric  2x2  matrices; 

•  it  provides  a  direct  proof  for  the  Proposition  10.2.2  for  symmetric  2x2  matrices; 

•  it  shows  that,  if  0  is  a  self-adjoint  endomorphism  in  E2 ,  then  E2  has  an  orthonormal 
basis  made  up  of  eigenvectors  for  f.  This  result  anticipates  the  general  result  which 
will  be  proven  in  the  Theorem  10.4.5. 
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We  consider  then  a  symmetric  matrix  A  e  M2,2, 


{ 011  an 
\012  022 


Its  characteristic  polynomial  Pa(T )  =  det(A  —  T  I2)  is  then 

Pa(T)  =  T2  -  (an  +  a22)T  +  ana22  -  a\2 


The  discriminant  of  this  degree  2  characteristic  polynomial  Pa(T)  is  not  negative: 

A  =  (an  +  a22)2  ~  4(ana22  -  022) 

=  (an—  a22)2  +  4  a\2  >  0 


being  the  sum  of  two  square  terms;  therefore  the  roots  Ai ,  A2  of  pa(T)  are  both  real. 
We  prove  next  that  A  is  diagonalisable,  and  that  the  matrix  P  giving  the  change  of 

c  c 

basis  is  orthogonal.  We  consider  the  endomorphism  c j b  corresponding  to  A  =  M(J)  ’  , 
for  the  canonical  basis  8  for  E 2,  and  compute  the  eigenspaces  V\l  and  V\2. 


•  If  A  =  0,  then  an  =  a22  and  <212  =  0.  The  matrix  A  is  already  diagonal,  so  we 
may  take  P  —  I2.  There  is  only  one  eigenvalue  Ai  =  an  =  <222-  Its  algebraic  mul¬ 
tiplicity  is  2  and  its  geometric  multiplicity  is  2,  with  corresponding  eigenspace 
V\i  =  E 2. 


•  If  A  >  0  the  characteristic  polynomial  has  two  simple  roots  Ai  7^  A2  with  corre¬ 
sponding  one  dimensional  orthogonal  (from  the  Proposition  10.2.2)  eigenspaces 
Vai  and  V\2.  The  change  of  basis  matrix  P,  whose  columns  are  the  normalised 
eigenvectors 


vi 


V>2 

and  — 
11^2 


is  then  orthogonal  by  construction.  We  notice  that  P  can  be  always  chosen  to  be 
an  element  in  SO(2),  since  a  permutation  of  its  columns  changes  the  sign  of  its 
determinant,  and  is  compatible  with  the  permutation  of  the  eigenvalue  Ai,  A2  in 
the  diagonal  matrix. 


In  order  to  explicitly  compute  the  matrix  P  we  see  that  the  eigenspace  Va,  for  any 
i  =  1 ,  2  is  given  by  the  solutions  of  the  linear  homogeneous  system  associated  to  the 
matrix 


A  -  A il2  = 


flu  —  A;  a\2 
a  12  022  —  \ 


Since  we  already  know  that  dim(yA;)  =  1,  such  a  linear  system  is  equivalent  to  a 
single  linear  equation.  We  can  write 


Ta/  =  {(*,  y)  :  (011  -  A i)x  +0i2T  =  0} 
=  £((-012,  011  -  A/))  =  C(vi), 
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where  we  set 


Vl  =  (— 012>011  —  Ai),  V2  =  (— 012,  011  —  A2). 
For  the  scalar  product, 

V\  *  V2  =  0^2  H"  011  —  (Ai  +  A2)011  +  A1A2  =  0 

since  one  has 

r\ 

Ai  +  A2  =  0n  +  022  ?  A1A2  =  0H022  —  012- 

Exercise  10.4.3  We  consider  again  the  symmetric  matrix 


from  the  Exercise  10.2.4.  Its  characteristic  polynomial  is 

pA(T)  =  det (A  -  TI2)  =  pA(T)  =  T2  -  5T  +  5, 


with  roots 

A±  =  1  (5  ±  75). 

The  corresponding  eigenspaces  V±  are  the  solutions  of  the  homogeneous  linear 
systems  associated  to  the  matrices 


A  —  A±/2  = 


1 

2 


A-1tV5)  -2  \ 

V  -2  (1±V5 )/ 


one  has  dim(V±)  =  1,  so  each  system  is  equivalent  to  a  single  linear  equation,  that 
is 


v±  =  £((-2,l±V5)  =  C(v±), 


where 

v+  =  (-2,  1  +  V5),  V -  =  (-2,  1  -  75), 

and  one  computes  that 

v+  •  V-  =4  —  4  =  0, 
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that  is  the  eigenspaces  are  orthogonal.  The  elements 


u  i 


II M 


and  U2 


v _ 


form  an  orthonormal  basis  for  E 2  of  eigenvectors  for  the  endomorphism  p&. 

We  present  now  the  fundamental  result  of  this  chapter,  that  is  the  spectral  theorem 
for  self-adjoint  endomorphisms  and  for  symmetric  matrices.  Towards  this,  it  is  worth 
mentioning  that  the  whole  theory,  presented  in  this  chapter  for  the  euclidean  space 
En ,  can  be  naturally  formulated  for  any  finite  dimensional  real  vector  space  equipped 
with  a  scalar  product  (see  Chap.  3). 

Definition  10.4.4  Let  p  :  V  — >  V  be  an  endomorphism  of  the  real  vector  space 
V ,  and  let  V  c  V  be  a  vector  subspace  in  V.  If  the  image  of  V  for  p  is  a  subset  of  the 
same  V  (that  is,  p(V)  c  V),  there  is  a  well  defined  endomorphism  py  :  V  —>  V 
given  by 

py(v)  =  p(v),  for  all  v  e  V 


(clearly  a  linear  map).  The  endomorphism  py  acts  in  the  same  way  as  the  endomor- 
phism  p,  but  on  a  restricted  domain.  This  is  why  py  is  called  the  restriction  to  V 
of  p. 

Proposition  10.4.5  (Spectral  theorem  for  endomorphisms)  Let  (V,  -)  be  a  real  vec¬ 
tor  space  equipped  with  a  scalar  product,  and  let  p  e  End(V).  The  endomorphism 
(p  is  self-adjoint  if  and  only  ifV  has  an  orthonormal  basis  of  eigenvectors  for  p. 

Proof  Let  us  assume  the  orthonormal  basis  C  for  V  is  made  of  eigenvectors  for  p.  This 

c  c 

implies  that  M0  ’  is  diagonal  and  therefore  symmetric.  From  the  Theorem  10.2.3  we 
conclude  that  p  is  self-adjoint. 

The  proof  of  the  converse  is  by  induction  on  n  =  dim(V).  For  n  =  2  the  state¬ 
ment  is  true,  as  we  explicitly  proved  in  the  Example  10.4.2.  Fet  us  then  assume  it 
to  be  true  for  any  ( n  —  1) -dimensional  vector  space.  Then,  let  us  consider  a  real 
n -dimensional  vector  space  (V,  •)  equipped  with  a  scalar  product,  and  let  p  be  a 
self-adjoint  endomorphism  on  V.  With  B  an  orthonormal  basis  for  V  (remember 
from  the  Theorem 3.3.9  that  such  a  basis  always  exists  V  finite  dimensional),  the 
matrix  A  =  M0  ’  is  symmetric  (from  the  Theorem  10.2.3)  and  thus  any  root  of  the 
characteristic  polynomial  Pa(T)  is  real.  Denote  by  A  one  such  an  eigenvalue  for  p , 
with  iq  a  corresponding  eigenvector  that  we  can  assume  of  norm  1. 

Then,  let  us  consider  the  orthogonal  complement  to  the  vector  line  spanned 
by  vu 

V  =  (£(i-i))x. 

In  order  to  meaningfully  define  the  restriction  to  V  of  p,  we  have  to  verify  that  for 
any  v  e  V  one  has  p(v)  e  V ,  that  is,  we  have  to  prove  the  implication 


v  •  v\  =  0  =>  p(v)  •  v\  =  0. 
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By  recalling  that  f  is  self-adjoint  and  </>( v\)  =  Xv\  we  can  write 

4>{v)  •  V\  =  v  •  cj)(v i)  =  v  •  (Xvi) 

=  X(v  ■  v\)  =  0. 

This  proves  that  f  can  be  restricted  to  a  fy  :  V  V ,  clearly  self-adjoint.  Since 
dim(V)  =  n  —  1,  by  the  inductive  assumption  there  exists  —  1  elements  (r>2,  . . . ,  vn) 
of  eigenvectors  for  <fy  making  up  an  orthonormal  basis  for  V.  Since  <fy  is  a  restriction 
of  </>,  the  elements  (i>2,  . . . ,  vn)  are  eigenvectors  for  cj>  as  well,  and  orthogonal  to  v\ 
as  they  all  belong  to  V.  Then  the  elements  (v\,  V2, . . . ,  vn)  are  orthonormal  and 
eigenvectors  for  </>.  Being  n  =  dim(V),  they  are  an  orthonormal  basis  for  V.  □ 


10.5  The  Diagonalization  of  Symmetric  Matrices 

There  is  a  counterpart  of  Proposition  10.4.5  for  symmetric  matrices. 

Proposition  10.5.1  (Spectral  theorem  for  symmetric  matrices)  Let  A  e  be  sym¬ 
metric.  There  exists  an  orthogonal  matrix  P  such  that  rPAP  is  diagonal.  This  result  is 
often  referred  to  by  saying  that  symmetric  matrices  are  orthogonally  diagonalisable. 

Proof  Let  us  consider  the  endomorphism  <f  =  f  A  ’  :  En  En ,  which  is  self- 

adjoint  since  A  is  symmetric  and  £  is  the  canonical  basis  (see  the  Theorem  10.2.3). 

From  the  Proposition  10.4.5,  the  space  En  has  an  orthonormal  basis  C  of  eigenvectors 

c  c 

for  cj),  so  the  matrix  M0  ’  is  diagonal.  From  Theorem 7.9.9  we  can  write 

MfC  =  Mc’£  M^£  M£’c . 

Since  M£,£  =  A,  by  setting  P  =  Mc-£  we  have  that  P~lAP  is  diagonal.  The 
columns  of  the  matrix  P  are  given  by  the  components  with  respect  to  £  of  the 
elements  in  C,  so  P  is  orthogonal  since  C  is  orthonormal.  □ 

Remark  10.5.2  The  orthogonal  matrix  P  can  always  be  chosen  in  SO  (ft),  since,  as 
already  mentioned,  the  sign  of  its  determinant  changes  under  a  permutation  of  two 
columns. 

Exercise  10.5.3  Consider  f  e  End(M4)  given  by 


f((x,  y,  z,  t))  =  {x  +  y,x  +  y,-z  +  t,z-  t). 
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Its  corresponding  matrix  with  respect  to  the  canonical  basis  £  in  M4  is  given  by 


A  = 


1  0  0  \ 
110  0 
0  0-1  1 
\0  0  1  -1  / 


Being  A  symmetric  and  £  orthonormal,  than  p  is  self-adjoint.  Its  characteristic  poly¬ 
nomial  is 


P<f>{T)=  Pa(T)=  det(A  -T  U) 


H 

1 

H 

1 

1 — * 

1 

1 — » 

1 

H 

H 

1 

1 

=  T2(T  -  2)(T  +2). 


The  eigenvalues  are  then  Ai  =  0  with  (algebraic)  multiplicity  m(0)  =  2,  X2  =  —2 
with  m{— 2)  =  1  and  A2  =  2  with  m( 2)  =  1.  The  corresponding  eigenspaces  are 
computed  to  be 


V0  =  ker(0)  =  £((  1,  -1,  0,  0),  (0,  0,  1,  1)), 
V_2  =  ker(</>  —  2/4)  =  £((1,1,  0,0)), 

V2  =  ker(0-/4)  =  £((0,0,  1,-1)) 


and  as  we  expect,  these  three  eigenspaces  are  mutually  orthogonal,  with  the  two 
basis  vectors  spanning  Vo  orthogonal  as  well.  In  order  to  write  the  matrix  P  which 
diagonalises  A  one  just  needs  to  normalise  such  a  system  of  four  basis  eigenvectors. 
We  have 


1  0  1  0  ^ 

/0  0  0  0\ 

1 

-10  10 

0  0  0  0 

P  ~  V2 

0  10  1 

,  fPAP  = 

0  0-20 

\  0  10-1 ) 

\0  0  0  2/ 

where  we  have  chosen  an  ordering  for  the  eigenvalues  that  gives  det(P)  =  1. 

Corollary  10.5.4  Let  €  End  (is”).  If  the  endomorphism  p  is  self-adjoint  then  it  is 

simple. 

Proof  The  proof  is  immediate.  From  the  Proposition  10.4.5  we  know  that  the  self¬ 
adjointness  of  p  implies  that  En  has  an  orthonormal  basis  of  eigenvectors  for  p.  From 
the  Remark  9.2.3  we  conclude  that  p  is  simple.  □ 
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Exercise  10.5.5  The  converse  of  the  previous  corollary  does  not  hold  in  general. 
Consider  for  example  the  endomorphism  in  E2  whose  matrix  with  respect  to  the 
canonical  basis  £  is 


An  easy  calculation  gives  for  the  eigenvalues  Ai  =  leA2  =  — 1  and  f  is  (see  the 
Corollary  9.4.2)  therefore  simple.  But  f  is  not  self-adjoint,  since 


<Kei)-e2=  (1,0).  (0,1)  =  0, 

ei-(t)(e2)=  (1,0).  (1,-1)  =  1, 

or  simply  because  A  is  not  symmetric.  The  eigenspaces  are  given  by 

Vi  =  £((1,0)),  V_i  =  £((1,-2)), 


and  they  are  not  orthogonal.  As  a  further  remark,  notice  that  the  diagonalising  matrix 


is  not  orthogonal. 

What  we  have  shown  in  the  previous  exercise  is  a  general  property  characterising 
self-adjoint  endomorphisms  within  the  class  of  simple  endomorphisms,  as  the  next 
theorem  shows. 

Theorem  10.5.6  Let  f  e  End (En)  be  simple ,  with  V\l ,  . . . ,  V\s  the  corresponding 
eigenspaces.  Then  <j>  is  self-adjoint  if  and  only  ifV _L  V\j  for  any  i  7^  j. 

Proof  That  the  eigenspaces  corresponding  to  distinct  eigenvalues  are  orthogonal  for 
a  self-adjoint  endomorphism  comes  directly  from  the  Proposition  10.2.2. 

Conversely,  let  us  assume  that  f  is  simple,  so  that  En  =  V\{  0  •  •  •  0  V\s .  The 
union  of  the  bases  given  by  applying  the  Gram-Schmidt  orthogonalisation  proce¬ 
dure  to  an  arbitrary  basis  for  each  V\p  yield  an  orthonormal  basis  for  En,  which  is 
clearly  made  of  eigenvectors  for  <j>.  The  statement  then  follows  from  the 
Proposition  10.4.5.  □ 

Exercise  10.5.7  The  aim  of  this  exercise  is  to  define  (if  possible)  a  self-adjoint 
endomorphism  f  :  E3  ->  E3  such  that  ker  (0)  =  £((1,  2,  1))  and  Ai  =  1,  A2  =  2 
are  eigenvalues  of  <j>. 

Since  ker (</>)  7^  {(0,  0,  0)},  then  A3  =  0  is  the  third  eigenvalue  for  0,  with 
ker (</>)  =  Vo.  Thus  f  is  simple  since  it  has  three  distinct  eigenvalues,  with  E3  = 
V\  0  V2  0  Vo.  In  order  for  <j>  to  be  self-adjoint,  we  have  to  impose  that  V\i  _L  V\r 
for  all  i  7^  j .  In  particular,  one  has 


(ker(0))x  =  (Vo)-1  =  Vi  ®  V2. 
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We  compute 


(ker(0))x  =  (£((1,2,  l))x 

=  {(a,  (3,  —a  —  2(3)  :  a,  (3  e  M} 

=  A(l,0,-1  ),(a,b,c)) 

where  we  impose  that  ( a,b,c )  belongs  to  £((  1,2,1))-*-  and  is  orthogonal  to 
(1,  0,  —1).  By  setting 

J(l,2,l).(fl,i,c)=0 
{  (1,0,  -1)  •  (a,b,c)  =  0  ’ 

we  have  (a,  b,  c)  =  (1,  —  1,  1),  so  we  select 

V\  =  £((1,  0,  — 1)),  V2  =  £((1,  -1,  1)). 

Having  a  simple  </>  with  mutually  orthogonal  eigenspaces,  the  endomorphism  (j)  self- 
adjoint.  To  get  a  matrix  representing  we  can  choose  the  basis  in  E3 

B  =  ((1,0,  —1),  (1,  —1,  1),  (1,2,  1)), 


thus  obtaining 


M 


3 


1  ooN 
=  1020 
000 


By  defining  e\  =  (1,0,  -i),e2  =  (1,  —1,  1)  we  can  write,  in  the  Dirac’s  notation, 


(j>  =  +2|e2)(«2l- 


Exercise  10.5.8  This  exercise  defines  a  simple,  but  not  self-adjoint,  endomor- 
phism</>  :  E3  — >  E3  such  that  ker(</>)  =  C((  1,  —1,  1))  and  Im(</>)  =  (ker(0))-L. 

We  know  that  0  has  the  eigenvalue  Ai  =  0  with  V0  =  ker (</>).  For  0  to  be  simple, 
the  algebraic  multiplicity  of  the  eigenvalue  A  i  must  be  1,  and  there  have  to  be  two 
additional  eigenvalues  A2  and  A3  with  either  A2  =  A3  or  A2  7^  A3.  If  A2  =  A3,  one 
has  then 

Va2  =  Im(/)  =  (ker(/))x  =  (Va,)-1- 


In  such  a  case,  <()  would  be  a  simple  endomorphism  with  mutually  orthogonal 
eigenspaces  for  distinct  eigenvalues.  This  would  imply  0  to  be  self-adjoint.  Thus 
to  satisfy  the  conditions  we  require  for  <fi  we  need  A2  7^  A3 .  In  such  a  case,  one  has 
Fxo  0  V\3  =  Im (0)  and  also  clearly  V\i  _L  Vq  for  /  =  2,  3.  In  order  for  0  to  be  sim- 
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pie  but  not  self-adjoint,  we  select  the  eigenspaces  V\2  and  V\3  to  be  not  mutually 
orthogonal  subspaces  in  Im (/).  Since 

Im (/)=  (£((1,-1,  l)))1 

=  {(x,  y,  z)  ■  x  -  y  +  z  =  0} 

=  £((1,1,0),  (0,1,1)) 


we  can  choose 


Va2  =  £((1,  1,0)),  Va3  =  £((0,  1,  1)). 


If  we  set  B  =  ((1, 
E 3),  we  have 


—  1,  1),  (1,  1,0),  (0,  1,  1))  (clearly  not  an  orthonormal  basis  for 

/0  0  0\ 

Mf B  =  0  A2  0  . 

\0  0  \3) 


Exercise  10.5.9  Consider  the  endomorphism  <p  :  E3  ->  E3  whose  corresponding 
matrix  with  respect  to  the  basis  B  =  =  (1, 1,  0),  V2  =  (1,  -1,  0),  113  =  (0,  0,  -1))  is 


M 


B3 

0 


1  ooN 
=  10  2  0 
0  0  3 


With  £  as  usual  the  canonical  basis  for  E3,  in  this  exercise  we  would  like  to  determine: 

(1)  an  orthonormal  basis  C  for  E3  given  by  eigenvectors  for  0, 

(2)  the  orthogonal  matrix  Ms,c, 

(3)  the  matrix  Mc,s , 

(4)  the  matrix  M ^ s , 

(5)  the  eigenvalues  of  cj)  with  their  corresponding  multiplicities. 

(1)  We  start  by  noticing  that,  since  M0  ’  is  diagonal,  the  basis  B  is  given  by  eigen¬ 
vectors  of  0,  as  the  action  of  </>  on  the  basis  vectors  in  B  can  be  clearly  written 
as  cj)(v  1)  =  v\,  4>(y 2)  =  2v2,  4>(v3)  =  The  basis  B  is  indeed  orthogonal,  but 
not  orthonormal,  and  for  an  orthonormal  basis  C  of  eigenvectors  for  we  just 
need  to  normalize,  that  is  to  consider 


U\  = 


v\ 


v\ 


u  2  = 


V2 


V2 


U  3  = 


v3 


v3 


just  obtaining  C  =  (^(1,  1,0),  -^(1,  —1,0),  (0,0,  —1)).  While  the  existence 

of  such  a  basis  C  implies  that  </>  is  self-adjoint,  the  self-adjointness  of  0  could 

13  13 

not  be  derived  from  the  matrix  M0  ’  ,  which  is  symmetric  with  respect  to  a  basis 
B  which  is  not  orthonormal. 
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(2)  From  its  definition,  the  columns  of  Ms,c  are  given  by  the  components  with 
respect  to  £  of  the  vectors  in  C.  We  then  have 


m£,c  = 


1 


V2 


(3)  We  know  that  Mc,£  =  1 .  Since  the  matrix  above  is  orthogonal,  we 

have 

1  /i  i  0  \ 

Mc,£  =  ’(M£'c)  =  —  1  -1  0  . 

72  yo  0  -72/ 


(4)  From  the  Theorem 7.9.9  we  have 

M£’£  =  M£’c  M£'C  Mc'£ . 

Since  M£,c  =  ,  the  matrix  M£’£  can  be  now  directly  computed. 

15  13 

(5)  Clearly,  from  M0  ’  the  eigenvalues  for  0  are  all  simple  and  given  by  A  =  1 ,  2,  3. 


Chapter  11 

Rotations 


® 

Check  for 
updates 


The  notion  of  rotation  appears  naturally  in  physics,  and  is  geometrically  formulated  in 
terms  of  a  euclidean  structure  as  a  suitable  linear  map  on  a  real  vector  space.  The  aim 
of  this  chapter  is  to  analyse  the  main  properties  of  rotations  using  the  spectral  theory 
previously  developed,  as  well  as  to  recover  known  results  from  classical  mechanics, 
using  the  geometric  language  we  are  describing. 


11.1  Skew- Adjoint  Endomorphisms 

In  analogy  to  the  Definition  10.2.1  of  a  self-adjoint  endomorphism,  we  have  the 
following. 

Definition  11.1.1  An  endomorphism  of  the  euclidean  vector  spaced  is  called 
skew-adjoint  if 


cj)(v)  •  w  =  —  v  •  <j>(w),  for  all  v,  w  e  En . 

From  the  Definition  4. 1.7  we  call  a  matrix  A  =  (aij )  e  skew-symmetric  (or  anti¬ 
symmetric)  if  rA  =  —A,  that  is  if  atj  =  — ap ,  for  any  /,  j .  Notice  that  the  skew- 
symmetry  condition  for  A  clearly  implies  for  its  diagonal  elements  that  an  =  0.  The 
following  result  is  an  analogous  of  the  Theorem  10.2.3  and  can  be  established  in  a 
similar  manner. 

Theorem  11.1.2  Let  f  e  End  (En)  and  B  an  orthonormal  basis  for  En.  The  endo¬ 
morphism  f  is  skew-adjoint  if  and  only  if  M®,B  is  skew -symmetric. 
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Proposition  11.1.3  Let  G  End (En)  be  skew -adjoint.  It  holds  that 
(a)  the  euclidean  vector  space  En  has  an  orthogonal  decomposition 

En  =  Im (</>)  0  ker(</>), 


(b)  the  rank  of  is  even. 

Proof  (a)  Let  u  e  En  and  v  g  ker (/).  We  can  write 

0  =  u  •  f(v)  ~  —  f(u)  •  v. 

Since  this  is  valid  for  any  u  g  En,  the  element  <f(u)  ranges  over  the  whole  space 
Im (/),  so  we  have  that  ker (</>)  =  (Im((/))±. 

(b)  From  ’MB'B  =  -MB’B,  it  follows  det (MB’B)  =  (-1)"  det (MB’B).  Thus  a 
skew-adjoint  endomorphism  on  an  odd  dimensional  euclidean  space  is  singu¬ 
lar  (that  is  it  is  not  invertible).  From  the  orthogonal  decomposition  for  En  of 
point  (a)  we  conclude  that  the  restriction  </>im(0)  :  Im (</>)  — >  Im (</>)  is  regular 
(that  is  it  is  invertible).  Since  such  a  restriction  is  skew-adjoint,  we  have  that 
dim(Im(</>))  =  dim(Im((/>))  =  rk (f)  is  even.  □ 

A  skew-adjoint  endomorphism  <f  on  En  can  have  only  the  zero  as  (real)  eigenvalue, 
so  it  is  not  diagonalisable.  Indeed,  if  A  is  an  eigenvalue  for  0,  that  is  f>(v)  =  Xv  for 
v  ^0  En  €  En,  from  the  skew-symmetry  condition  we  have  that  0  =  v  •  f>(v)  =  A  v  •  v, 
which  implies  A  =  0.  Also,  since  its  characteristic  polynomial  has  non  real  roots,  it 
does  not  have  a  Jordan  form  (see  Theorem 9.5.1). 

Although  not  diagonalisable,  a  skew-adjoint  endomorphism  has  nonetheless  a 
canonical  form. 

Proposition  11.1.4  Given  a  skew -adjoint  invertible  endomorphism  <j>  :  E2p  —>  Elp, 
there  exists  an  orthonormal  basis  B  for  Elp  with  respect  to  which  the  representing 
matrix  for  cj)  is  of  the  form, 


f  o  /J I  \ 

-Ml  o 


with  pj  G  M/or  j  =  1,  . . . ,  p. 


0  fip 

Pp  0  J 
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Proof  The  map  S  =  f2  =  f  o  0  is  a  self-adjoint  endomorphism  on  E2p ,  so  there 
exists  an  orthonormal  basis  of  eigenvectors  for  S.  Given  S(wj)  =  X  j  wj  with  A  j  e  R, 
each  eigenvalue  A j  has  even  multiplicity,  since  the  identity 

S(4>(Wj ))  =  4>(S(Wj ))  =  A  i4>(Wj) 

shows  that  Wi  and  f(wi)  are  eigenvectors  of  S  with  the  same  eigenvalue  A/ .  We  label 
then  the  spectrum  of  S  by  (Ai,  . . . ,  Xk)  and  the  basis  C  =  (w\,  . . . ,  wp, 

We  also  have 

A;  =  Wi  ■  S(Wi)  =  Wi  ■  (f)2(Wi )  =  ~4>{Wi)  ■  <p(Wi )  =  -  \\4>(Wi)\\2 

and,  since  we  took  <fi  to  be  invertible,  we  have  A/  <0.  Define  the  set  B  =  {e\ , . . . ,  e^p) 
of  vectors  as 


*2.7-1  =  Wj, 


4>(wj) 


for  j  =  1, . . . ,  p.  A  direct  computation  shows  that  ej  ■  =  Sj^  with  j,k  =  l, ...  ,2  p 

and 


f(e2j~i)=  y/\Xj\  e2 j ,  f(e2j)  =  —y/\Xj  \  e2j- 

Thus  B  is  an  orthonormal  basis  with  respect  to  which  the  matrix  representing  the 
endomorphism  f  has  the  form  above,  with  /ij  =  y/\Xj\.  □ 

Corollary  11.1.5  If  f  is  a  skew-adjoint  endomorphism  on  En,  then  there  exists  an 
orthonormal  basis  B  for  En  with  respect  to  which  the  associated  matrix  off  has  the 
form 

/  0  \ 

-/ii  0 


M 


B3 


0  pp 

dp  o 

o 


with  pj  g  R,  j  =  1,  •  •  •  ,  p,  and  2 p  <  n. 


The  study  of  antisymmetric  matrices  makes  it  natural  to  introduce  the  notion  of 
Lie  algebra. 
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Definition  11.1.6  Given  A,  B  e  M",w,  one  defines  the  map  [ ,  ]  :  W l,n  x 


[A,  B]  =  AB  -  BA 


as  the  commutator  of  A  and  B.  Using  the  properties  of  the  matrix  product  is  it  easy 
to  prove  that  the  following  hold,  for  any  A,  B,  C  e  and  any  a  e  R: 

(1)  [A,B]  =  -[B,A],  [ aA,B ]  —  a[A,  B ],  [A  +  B,  (7]  —  [A,  (7]  +  \_B ,  (7],  that 

is  the  commutator  is  bilinear  and  antisymmetric, 

(2)  [AB,C]  =  A[£,C]  +  [A,C]£, 

(3)  [A,  [B,  C]]  +  [B,  [C,  A]]  +  [C,  [A,  5]]  =  0;  this  is  called  the  Jacoby  identity. 

Definition  11.1.7  If  W  c  Whn  is  a  vector  subspace  such  that  the  commutator  maps 
W  x  W  into  W ,  we  say  that  W  is  a  (matrix)  Lie  algebra.  Its  rank  is  the  dimension 
of  W  as  a  vector  space. 

Excercise  11.1.8  The  collection  of  all  antisymmetric  matrices  Wa  C  is  a 
matrix  Lie  algebra  since,  if T A  —  —A  and  rB  =  —B  it  is 

'([A,  B])  =tBtA  -  tAtB  =  BA-  AB. 


As  a  Lie  algebra,  it  is  denoted  $o(n)  and  one  easily  computed  its  dimension  to  be 
n(n  —  l)/2.  As  we  shall  see,  this  Lie  algebra  has  a  deep  relation  with  the  orthogonal 
group  SO(n). 

Remark  11.1.9  It  is  worth  noticing  that  the  vector  space  Ws  C  of  symmetric 
matrices  is  not  a  matrix  Lie  algebra,  since  the  commutator  of  two  symmetric  matrices 
is  an  antisymmetric  matrix. 


Excercise  11.1.10  It  is  clear  that  the  matrices 


L 


l 


/0  0  0  \ 
I  0  0  — 1  1 
\0  1  0/ 


0  -1  0 
La  =  I  1  0  0 
0  0  0 


provide  a  basis  for  the  three  dimensional  real  vector  space  of  antisymmetric  matrices 
W a  C  M3,3.  As  the  matrix  Lie  algebra  so (3),  one  computes  the  commutators: 


[Ti,L2]  =  L3,  [L2,  L?,]  =  L 1,  [L3,  Li]  =  L2. 


Excercise  11.1.11  We  consider  the  most  general  skew-adjoint  endomorphism  0  on 
E3.  With  respect  to  the  canonical  orthonormal  basis  S  =  (e\ ,  e2,  e2)  it  has  associated 
matrix  of  the  form 


/  0  —fp\ 

M^’£  =  I  7  0  —  ol  I  =  oiL\  +  PL2  +  7L3. 

\—/3  a  0  J 
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with  a,  (3,  y  e  R.  Any  vector  (x,  y,  z)  in  its  kernel  is  a  solution  of  the  system 

°-7  P\  M  /0 

7  0  -a  j  =  0 

— / 3  a  0  )  \z)  \0 

It  is  easy  to  show  that  the  kernel  is  one-dimensional  with  ker (0)  =  £((a,  (3,  7)). 
Since  <fi  is  defined  on  a  three  dimensional  space  and  has  a  one-dimensional  kernel, 
from  the  Proposition  11.1.4  the  spectrum  of  the  map  S  =  <fi2  is  made  of  the  sim¬ 
ple  eigenvalue  Ao  =  0  and  a  multiplicity  2  eigenvalue  A  <  0,  which  is  such  that 
2 A  =  tr (M$,s)  with 

/  0  —7  f3  \  /  0  —7  (3  \  /— 72  —  /32  a (3  ay  \ 

=  I  7  0  —ce  I  [  7  0  —a  I  =  I  a(3  —y2  —  a2  f3y  I  ; 

\—(3  a  0  J  \—(3  a  0  J  \  ay  (3y  —/32  —  a2 ) 

thus  A  =  —(a2  +  (32  +  72).  For  the  corresponding  eigenspace  V\  3  (x,  y,  x)  one 

has 

(a2  a (3  ay\  / / 0\  f  a(ax  +  (3y  +  7 z)  =  0 

a(3  (32  (3y  I  |  y  I  =  |  0  J  \  f3(ax  +  (3y  +  yz)  =  0  . 

ay  (3y  y2 )  \z)  \0 /  [  y(ax  +  (3y  +  yz)  =  0 

Such  a  linear  system  is  equivalent  to  the  single  equation  (ax  +  (3y  +  yz)  =  0,  which 

shows  that  ker (S)  is  orthogonal  to  Im(S).  To  be  definite,  assume  a/0,  and  fix  as 
basis  for  V\ 

m  =  (-7,  0,  a), 

w2  =  <j>(w{)  =  (~a(3,  a2  +  y2,  ~(3y), 


with  w  1  •  =  0.  With  the  appropriate  normalization,  we  define 

w  1 


u  1  = 


u  2  = 


= 


||  nil  || 

0(wi) 

11^1)11’ 

1 


J  a2  +  ^2  +  72 


(a,  f3,  7) 


and  verify  that  C  =  (ui,u2,  u2)  is  an  orthonormal  basis  for  E13.  With  MCi ,s  the 
orthogonal  matrix  of  change  of  bases  (see  the  Theorem 7.9.9),  this  leads  to 


(0  -p  0\ 

MC'8M^sM8'C  =  =  \p  0  0  ,  p  =  |A|  =  a2  +  (32  +  72, 

\0  0  0/ 


an  example  indeed  of  Corollary  11.1.5. 
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11.2  The  Exponential  of  a  Matrix 


In  Sect.  10.1  we  studied  the  properties  of  the  orthogonal  group  0(n)  in  En .  Before 
studying  the  spectral  properties  of  orthogonal  matrices  we  recall  some  general  results. 

Definition  11.2.1  Given  a  matrix  A  e  R"'",  its  exponential  is  the  matrix  eA  defined 
by 

OO 


=  T-a 

^  k\ 

k= 0 


where  the  sum  is  defined  component-wise,  that  is  ( eA)ji  =  ^  ( Ak)p . 

We  omit  the  proof  that  such  a  limit  exists  (that  is  each  series  converges)  for  every 
matrix  A,  and  we  omit  as  well  the  proof  of  the  following  proposition,  which  lists 
several  properties  of  the  exponential  maps  on  matrices. 

Proposition  11.2.2  Given  matrices  A,  B  e  and  an  invertible  matrix  P  e  GL  {n), 
the  following  identities  hold: 

(a)  eA  g  GL (n),  that  is  the  matrix  eA  is  invertible,  with  (eA)~{  =  e~A  and 
det(^A)  =  etrA, 

(b)  if  A  =  diagOzn,  . . . ,  ann),  then  eA  =  diag(ean,  . . . ,  ea,m), 

(c)  ePAP~X  =  PeAP~\ 

(d)  if  AB  =  BA,  that  is  [A,  B]  =  0,  then  eAeB  =  eBeA  =  eA+B , 

(e)  it  is  eA  = 

(f)  ifWd  is  a  matrix  Lie  algebra,  the  elements  eM  with  M  e  W  form  a  group 
with  respect  to  the  matrix  product. 

Excercise  11.2.3  Let  as  determine  the  exponential  eQ  of  the  symmetric  matrix 

e  =  (o  o)  • 

We  can  proceed  in  two  ways.  On  the  one  hand,  it  is  easy  to  see  that 


~2k  ia2k  o 
^  _  ^  0  a2k  1  ’ 


Qlk-\-\  _ 


0  a2k+l 


a 


2k +1 


0 


Thus,  by  using  the  definition  we  compute 


/ 


E 


oo  a 


2k 


k= 0  (2k) ! 


E 


oo  a 


2k+\ 


k= 0  (2k+l)\ 


\ 


v-^oo  a2k+1  \-^oo  a2k 

\2^k= 0  (2k+\)\  2^k= 0  (2 k)\  / 


cosh  a  sinha 
sinh  a  cosh  a 
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Alternatively,  we  can  use  the  identities  (c)  and  (b)  in  the  previous  proposition,  once 
Q  has  been  diagonalised.  It  is  easy  to  compute  the  eigenvalues  of  Q  to  be  A±  =  ±<z, 

with  diagonalising  orthogonal  matrix  P  =  jj.  That  is,  PAqP~1  =  Q  with 

with  A q  =  diag(— a,  a ), 


1  /  1  l\  (-a  0\  -1\  _  / 0  a\ 

2  \-l  l)  \  0  a)  \l  l  )~  [a  OJ' 


We  then  compute 


Q  -  „p^qP~x  - 


=  e 


peAQT~l 

i  (\ ! 


-a 


0 


0 


M 


1  -1 
1  1 


/ cosh  a  sinh  a 
y  sinh  <2  cosh  a 


Notice  that  det(e^)  =  cosh2  a  —  sinh2  a  =  1  =  etv  Q . 

Excercise  11.2.4  Let  us  determine  the  exponential  eM  of  the  anti- symmetric  matrix 


M  = 


0  a 
-a  0}  9 


a  e  R. 


Since  M  is  not  diagonalisable,  we  explicitly  compute  eM  as  we  did  in  the  previous 
exercise,  finding 

= <-»*  (t  „»)  •  = <-»*  (-„*+■  “ T)  ■ 

By  putting  together  all  terms,  one  finds 

/  Y^°°  (  1  \k  a2k  V-oo  /_  i  \k  a2k+]  \ 

^k= OV  (2  k)l  ^k= (A  (2k+l)\ 

eM  = 

Y^°°  f—]\k  a2k+1  V-oo  /_i^ 

Z^^=0V  L  (2£+l)!  ^k=0y  (2k)\  / 

We  see  that  if  M  is  a  2  x  2  anti- symmetric  matrix,  the  matrix  is  special 
orthogonal.  This  is  an  example  for  the  point  (/)  in  the  Proposition  11.2.2. 

Excercise  11.2.5  In  order  to  further  explore  the  relations  between  anti-symmetric 
matrices  and  special  orthogonal  matrices,  consider  the  matrix 

/  0  a  0\ 

M  =  \-a  0  0  a  e  M. 

\  0  00/ 


(cos  a  sin  a 
—  sin  a  cos  a 
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In  parallel  with  the  computations  from  the  previous  exercise,  it  is  immediate  to  see 
that 

(cos  a  sin  a  0 
—  sin  a  cos  a  0 
0  0  1 

This  hints  to  the  conclusion  that  eM  e  SO(3)  if  Me  M3,3  is  anti- symmetric. 

The  following  proposition  generalises  the  results  of  the  exercises  above,  and 
provides  a  further  example  for  the  claim  (/)  from  the  Proposition  11.2.2,  since  the 
set  Wa  C  W1'"  of  antisymmetric  matrices  is  the  (matrix)  Lie  algebra5o(ft),  as  shown 
in  the  exercise  11.1.8. 

Proposition  11.2.6  IfM  €  M72,n  is  anti- symmetric,  then  eM  is  special  orthogonal. 
The  restriction  of  the  exponential  map  to  the  Lie  algebra  so  (n)  of  anti- symmetric 
matrices  is  surjective  onto  SO (n). 

Proof  We  focus  on  the  first  claim  which  follows  from  point  (a)  of  Proposition  1 1.2.2. 
If  M  e  is  anti- symmetric,  rM  =  —  M  and  tr(M)  =  0.  Thus  T(eM )  =  eM  = 

e~M  _  i  and  det(^M)  =  eiv(M)  =  e°  =  1.  □ 

Remark  11.2.7  As  the  Exercise  11.2.5  directly  shows,  the  restriction  of  the  expo¬ 
nential  map  to  the  Lie  algebra  so(n)  of  anti- symmetric  matrices  is  not  injective  into 
SO  (n). 

In  the  Example  11.3.1  below,  we  shall  sees  explicitly  that  the  exponential  map, 
when  restricted  to  2-dimensional  anti- symmetric  matrices,  is  indeed  surjective  onto 
the  group  SO (2). 


11.3  Rotations  in  Two  Dimensions 


We  study  now  spectral  properties  of  orthogonal  matrices.  We  start  with  the  orthogonal 
group  0(2). 

Example  11.3.1  Let  A  =  (  11  an\  e  M2,2.  The  condition  TAA  =  ATA  =  I2  is 

V*2i  Cl22 ) 

equivalent  to  the  conditions  for  its  entries  given  by 


a 

a 


2 

11 

2 

21 


+  a12 
+  ^22 


^11^21  +  ^12^22  —  0- 
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To  solve  these  equations,  let  us  assume  an  ^  0  (the  case  <222  7^  0  is  analogous).  We 
have  then  <221  =  —  (<2i2<222)/<2n  from  the  third  equation  while,  from  the  others,  we 
have 


+ 


=  a 


2 

it- 


There  are  two  possibilities. 

•  If  an  =  <222,  it  follows  that  <212  +  <221  =  0,  so  the  matrix  A  can  be  written  as 


A 


+  — 


a  b\ 

—b  a  J 


with  a2  +  b2  =  1 , 


and  det(A+)  =  a2  +  b2  =  1.  One  can  write  a  =  cos  p,  b  =  sin  p,  for  p  e  R,  so 
to  get 

^  _  /  cos  p  sirup 

+  y—  sin  Lp  cos  ip 

•  If  an  =  —<222,  it  follows  that  <212  =  <221,  so  the  matrix  A  can  be  written  as 


A_  =  ^  with  a2  +  b2  =  1, 

and  we  can  write 

cos  ip  sin  ip 
sin  p  —  cos  p 


with  det(A_)  =  —a2  -\-b2  =  —1. 

Finally,  it  is  easy  to  see  that  an  =  0  would  imply  <222  =  0  and  a\2  =  a2l  =  1.  These 
four  cases  correspond  to  p  =  ±|  for  A+  or  A_,  according  to  wether  <212  =  —<221  or 
a  1 2  =  <221  respectively. 

We  see  that  A+  makes  up  the  special  orthogonal  group  SO(2),  while  A_  the  orthog¬ 
onal  transformations  in  E 2  which  in  physics  are  usually  called  improper  rotations. 

Given  the  27r-periodicity  of  the  trigonometric  functions,  we  see  that  any  element  in 
the  special  orthogonal  group  SO(2)  corresponds  bijectively  to  an  angle  f  e  [0,  2t r). 

On  the  other  hand,  any  improper  orthogonal  transformation  can  be  factorised  as 
the  product  of  a  SO (2)  matrix  times  the  matrix  Q  =  diag(l,  —1), 

/  cos  p  sin(^\_/l  0  \  /  cos  p  sin  p 
ysin  p  —  cos  pj  yO  —  IJ  y—  sin  p  cos  p 

Thus,  an  improper  orthogonal  transformation  ‘reverses’  one  of  the  axis  of  any  given 
orthogonal  basis  for  E2  and  so  changes  its  orientation. 
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Remark  11.3.2  Being  0(2)  a  group,  the  product  of  two  improper  orthogonal  trans¬ 
formations  is  a  special  orthogonal  transformation.  We  indeed  compute 

/ cos  p  sin  p  \  ( cos  p'  sin  p'  \  _  (  cos (p'  —  p)  sin (p'  —  p) 
ysin  p  —  cos  p J  ysin  p'  —  cos  pf J  y—  sin (p'  —  p)  cos (p'  —  p) 

Proposition  11.3.3  A  matrix  A  e  S0(2)  is  diagonalisable  if  and  only  if  A  =  ±/2. 
An  orthogonal  matrix  A  with  det(A)  =  —  1  is  diagonalisable,  with  spectrum  given 
by  A  =  ±1. 

Proof  From  the  previous  example  we  have: 

(a)  The  eigenvalues  A  for  a  special  orthogonal  matrix  are  given  by  the  solutions  of 
the  equation 

pA+  (T)  =  (cos  p  —  T )2  +  sin2  p  —  T2  —  2(cos  p)  T  +  1  =  0, 

which  are  A±  =  cos  p  zb  ^/cos2  p  —  1 .  This  shows  that  A+  is  diagonalisable 
if  and  only  if  cos2  =  1,  that  is  A+  =  di/2. 

(b)  Improper  orthogonal  matrices  A_  turn  to  be  diagonalisable  since  they  are  sym¬ 
metric.  The  eigenvalue  equation  is 

Pa_  =  (T  +  cos  p)(T  —  cos  p)  —  sin2  p  =  T2  —  1  =  0, 

giving  A±  =  ±  1 .  □ 


j  e  SO(2). 


11.4  Rotations  in  Three  Dimensions 

We  move  to  the  analysis  of  rotations  in  three  dimensional  spaces. 

Excercise  11.4.1  From  the  Exercise  1 1 . 1 . 1 1  we  know  that  the  anti- symmetric  matri¬ 
ces  in  M3,3  form  a  three  dimensional  vector  space,  thus  any  anti-symmetric  matrix 
M  is  labelled  by  a  triple  (a,  [3,  7)  of  real  parameters.  The  vector  a  =  (a,  /?,  7)  is  the 
generator,  with  respect  the  canonical  basis  £  of  E3 ,  of  the  kernel  of  the  endomorphism 
f  associated  to  M  with  respect  to  the  basis  £,  M  =  ’  . 

Moreover,  from  the  same  exercise  we  know  that  there  exists  an  orthogonal 
matrix  P  which  reduces  M  to  its  canonical  form  (see  Corollary  11.1.5),  that  is 
M  —  P  a mP~1  with 


a  m  — 


0  —p  0 
p  0  0 

0  0  0 


9 


and  p2  =  a2  +  (32  +  72 , 
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with  respect  to  an  orthonormal  basis  C  for  E 3  such  that  P  =  Ms,c ,  the  matrix  of 
change  of  basis.  From  the  Exercise  1 1.2.5  it  is 


SO(3)  9  eaM 


cos  p  —  sin  p  O' 
sin  p  cos  p  0 
0  0  1 


an) 


and,  if  R  =  eM ,  from  the  Proposition  11.2.2  one  has  R  =  PeaM  P~Y . 

The  only  real  eigenvalue  of  the  orthogonal  transformation  eaM  is  then  A  =  1,  cor¬ 
responding  to  the  1  -dimensional  eigenspace  spanned  by  the  vectors  =  (a,  (3,  7) .  The 
vector  line  C (a)  is  therefore  left  unchanged  by  the  isometry  of  E3  corresponding 
to  the  matrix  R ,  that  is  such  that  =  R. 

From  the  Proposition  11.2.6  we  know  that  given  R  e  SO(3),  there  exists  an  anti¬ 
symmetric  matrix  M  e  M3,3  such  that  R  =  eM .  The  previous  exercise  gives  then  the 
proof  of  the  following  theorem. 

Theorem  11.4.2  For  any  matrix  R  e  SO(3)  with  ^7^/3  there  exists  an  orthonor¬ 
mal  basis  B  on  E 3  with  respect  to  which  the  matrix  R  has  the  form  (11.1). 

This  theorem,  that  is  associated  with  the  name  of  Euler,  can  also  be  stated  as 
follow: 

Theorem  11.4.3  Any  special  orthogonal  matrix  R  e  SO(3)  has  the  eigenvalue  +1. 

Q  C  C 

Those  isometries  f  e  End (2s whose  representing  matrices  M0  ’  with  respect  to 
an  orthonormal  basis  £  are  special  orthogonal  are  also  called  3 -dimensional  rotation 
endomorphisms  or  rotations  tout  court.  With  a  language  used  for  the  euclidean  affine 
spaces  (Chap.  15),  we  then  have: 

•  For  each  rotation^  of  E3  there  exists  a  unique  vector  line  (a  direction)  which  is 
left  unchanged  by  the  action  of  the  rotation.  Such  a  vector  line  is  called  the  rotation 
axis. 

•  The  width  of  the  rotation  around  the  rotation  axis  is  given  by  an  angle  p  obtained 
from  (11.1),  and  implicitly  given  by 

(cos  p  —  sin  p  0\ 

sinp  cos  p  0  1  =  tr  P~lRP  =  trR,  (11.2) 

0  0  \) 

from  the  cyclic  property  of  the  trace. 

Excercise  11.4.4  Consider  the  rotation  of  E3  whose  matrix  £  =  £2-^3)  is 

(cos a  sina  0\  /l  0  0 

—  sin  a  cos  a  0  I  I  0  cos  (3  sin  /3 
0  0  ly  yO  —  sin  (3  cos  f5 


R  = 
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with  respect  to  the  canonical  basis.  Such  a  matrix  is  the  product  R  =  R1R2  of  two 
special  orthogonal  matrices.  The  matrix  R\  is  a  rotation  by  an  angle  a  with  rota¬ 
tion  axis  the  vector  line£(ei)  and  angular  width  a,  while  R2  is  a  rotation  by  the 
angle  [3  with  rotation  axis  £(£3).  We  wish  to  determine  the  rotation  axis  for  R  with 
corresponding  angle.  A  direct  calculation  yields 

(cos  a  sin  a  cos  (3  sin  a  sin  (3 
—  sin  a  cos  a  cos  f3  cos  a  sin  (3 
0  —  sin  /3  cos  / 3 

Since  R  ^  I3  for  a  ^  0  and  (3  7^  0,  the  rotation  axis  is  given  by  the  eigenspace 
corresponding  to  the  eigenvalue  A  =  1 .  This  eigenspace  is  found  to  be  spanned  by 
the  vector  v  with 

v  =  (sina(l  —  cos  (3),  (cos  a  —  1) (1  —  cos  (3),  sin/3(l  —  cos  a))  if  a  +  0,  [3  +  0, 

v  =  (1,  0,  0)  if  a  =  0, 

v  =  (0,  0,  1)  if  (3  =  0. 


The  rotation  angle  p  can  be  obtained  (implicitly)  from  the  Eq.  (1 1.2)  as 

1+2  cos  p  =  tr (R)  =  cos  a  +  cos  (3  +  cos  a  cos  / 3 . 

Excercise  11.4.5  Since  the  special  orthogonal  group  SO(n)  is  non  abelian  for  n  >  2, 
for  the  special  orthogonal  matrix  given  by  R'  =  R2R1  one  has  R'  7^  R.  The  matrix 
R '  can  be  written  as 


R 


! 


(cos  a 
—  sin  a  cos  / 3 
sin  a  sin  f3 


sin  a  0 
cos  a  cos  / 3  sin  (3 
—  sin  / 3  cos  a  cos  /3 


One  now  computes  that  while  the  rotation  angle  is  the  same  as  in  the  previous  exercise, 
the  rotation  axis  is  spanned  by  the  vector  v'  with 

1/  =  ( sin  a  sin  (3,  (1  —  cos  a)  sin  / 3 ,  (1  —  cos  a)(l  +  cos  /?))  if  a  j=-  0,  (3  ^  0, 
v'  =  (1,  0,  0)  if  a  =  0, 

1/  =  (0,  0,  1)  if  (3  =  0. 


Excercise  11.4.6  Consider  the  matrix  R"  =  Q1Q2  given  by 

(cos «  sina  0\  /l  0  0  \ 

sin  a  —  cos  a  0  I  |  0  cos  (3r  sin  / 3 ’  I  . 

0  0  1  /  \0  sin  f3'  -  cos  f3'J 
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Now  neither  Q  \  nor  <2  2  are  (proper)  rotation  matrix:  both  Q\  and  Q2  are  in  0(3),  but 
det(Qi)  =  det(02)  =  —  1  (see  the  Example  11.3.1,  where  0(2)  has  been  described, 
and  the  Remark  11.3.2).  The  matrix  R"  is  nonetheless  special  orthogonal  since  0(3) 
is  a  group  and  det(R")  =  1. 

One  finds  that  the  rotation  axis  is  the  vector  fine  spanned  by  the  vector  v"  with 

v "  =  ( sin  a  sin  /?',  (1  —  cos  a)  sin  /3' ,  (1  —  cos  a)(l  —  cos  /3'))  if  a  p  0,  (5f  p  0, 
v"  =  (1,0,0)  if  a  =  0, 
v"  =  (0,  0,  1)  if  0  =  0. 


One  way  to  establish  this  result  without  doing  explicit  computation,  is  to  observe  that 
R"  is  obtained  from  R '  in  Exercise  1 1.4.5  under  a  transposition  and  the  identification 

/3'  =  7T  —  P. 

Excercise  11.4.7  As  an  easy  application  of  the  Theorem  10.1.13  we  know  that,  if 
B  =  (u  1,  U2,  up  and  C  =  (v  1,  i>2,  vp  are  orthonormal  bases  in  E 3,  then  the  orthog¬ 
onal  endomorphism  </>  mapping  Vk  i->  Uk  is  represented  by  a  matrix  whose  entry 
is  given  by  the  scalar  product  •  va 

MJ,’C  =  ®=  (®ab  =  Ub  ■  Va\ 

r  V  /  a,b=  1,2,3 

It  is  easy  indeed  to  see  that  the  matrix  element  fOO)^  is  given  by 

3  3 

^  '  ^ak^as  —  ^  ^{ua  •  Vp(ua  •  l)P  —  Vk  •  Vs  =  8jcs 
a= 1  a=l 

thus  proving  that  O  is  orthogonal.  Notice  that  M®,B  =  =  O-1. 

Excercise  11.4.8  Let  £  =  (e\,  e2,  ep  be  an  orthonormal  basis  for  E 3.  We  compute 
the  rotation  matrix  corresponding  to  the  change  of  basis  £  — >  £>with£>  =  (u\,U2,  up 
for  any  given  basis  B  with  the  same  orientation  (see  the  Definition  10.1.15)  of  £. 

Firstly,  consider  a  vector  u  of  norm  1 .  Since  such  a  vector  defines  a  point  on  a 
sphere  of  radius  1  in  the  three  dimensional  physical  space  S,  which  can  be  identified 
by  a  latitude  and  a  longitude ,  its  components  with  respect  to  £  are  determined  by 
two  angles.  With  respect  to  Figure  11.1  we  write  them  as 

u  =  (sin  (p  sin  0,  —  cos  ip  sin  6,  cos  0) 

with  6  £  (0, 7r)  and  p)  e  [0,  2i r).  Then,  to  complete  u  to  an  orthonormal  basis  for  E 3 
with  u'3  =  u ,  one  finds, 

=  un  =  (cos  p,  sin  p,  0) , 
u2  =  (—  sin  p)  cos  0,  cos  p  cos  6,  sin  6). 
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Fig.  11.1  The  Euler  angles 


The  rotation  matrix  (with  respect  to  the  basis  £)  of  the  transformation 
£  — >  (u\ ,  u2,  u'3 )  is  given  by 


(cos  p  —  sin  p  cos  6  sin  p  sin  0 
sin  p  cos  p  cos  0  —  cos  p>  sin  0 
0  sin  0  cos  9 

Since  the  choice  of  u\ ,  u2  is  unique  up  to  a  rotation  around  the  orthogonal  vector  u , 
we  see  that  the  most  general  SO(3)  rotation  matrix  mapping  >  u  is  given  by 

(cos  ijj  —  sin  0\ 
sin  iJj  cos  0  I 

0  0  \) 

(cos  p  cos  —  sin  p  cos  6  sin  i/j 
sin  p  cos  ip  +  cos  p  cos  0  sin  ip 
sin#  sin ^7 

with  -0  g  [0,  27t).  This  shows  that  the  proper  3-dimensional  rotations,  that  is  the 
group  SO(3),  can  be  parametrised  by  3  angles.  Such  angles  are  usually  called  Euler 
angles ,  and  clearly  there  exist  several  (consistent  and  equivalent)  different  choices 
for  them. 


cos  p  sin  ip  —  sin  p  cos  6  cos  ip  sin  p  cos  6 
sin  p  sin  ip  +  cos  p  cos  6  cos  ip  —  cos  p  sin  0 
sin#  cos  ip  cos# 
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Our  result  depends  on  the  assumption  that  sin  0  ^  0,  which  means  that  u\  ^  ±£3 
(this  corresponds  to  the  case  when  u  1  is  the  north- south  pole  direction).  The  most  gen¬ 
eral  rotation  matrix  representing  an  orthogonal  transformation  with^i  — >  u\  =  ±£3 
is  given  by 

(0  cos  pj  =f  sin  pj 
0  sin  pj  ±  cos  ip 
±10  0 


We  finally  remark  that  the  rotation  matrix  R(0,  ip,pj)  can  be  written  as  the  product 


cos  p)  —  sin  p  0\  / 1  0  0 

R{6 ,  ip,  ip)  =  |  sin  p  cos  p>  0  I  I  0  cos  9  —  sin  6 

0  0  1  /  \  0  sin  6  cos  6 


cos  pj  —  sin  pj  0 
sin  pj  cos  pj  0  |  . 
0  0  1 


This  identity  shows  that  we  can  write 


R(9,  (3,  ip)  =  e^Ll  e6Ll 


where  L\  and  L3  are  the  matrices  in  Exercise  11.1.10.  These  matrices  are  the  ‘gen¬ 
erators’  of  the  rotations  around  the  first  and  third  axis,  respectively. 

In  applications  to  the  dynamics  of  a  rigid  body,  with  reference  to  the  Figure  11.1, 
the  angle  p)  parametrises  the  motion  of  precession  of  the  axis  U3  around  the  axis  e^, 
the  angle  9  the  motion  of  nutation  of  the  axis  u 3  and  the  angle  p>  the  intrinsic  rotation 
around  the  axis  U3.  The  unit  vector  indicates  the  line  of  nodes,  the  intersection  of 
the  plane  (^1^2)  with  the  plane  (u\U2). 

We  close  this  section  by  listing  the  most  interesting  properties  of  orthogonal 
endomorphisms  in  En  with  n  >  0.  Endomorphisms  <p  whose  representing  matrix 

c  c 

M(j)  ’  are  special  orthogonal,  with  respect  to  an  orthonormal  basis  £  for  En ,  are  called 
rotations.  From  the  Proposition  11.2.6  we  know  that  there  exists  an  anti- symmetric 
matrix  M  such  that  =  eM .  When  rk(M)  =  2k,  the  matrix  eM  depends  on  k 
angular  variables. 

From  the  Corollary  11.1.5  and  a  direct  generalisation  of  the  computations  above, 
one  can  conclude  that  for  each  n -dimensional  rotation: 

•  There  exists  a  vector  subspace  V  C  En  which  is  left  unchanged  by  the  action  of 
the  rotation,  with  dim(V)  =  n  —  rk(M). 

•  Since  rk(M)  is  even,  we  have  that,  if  n  is  odd,  then  V  is  odd  dimensional  as  well, 
and  at  least  one  dimensional.  If  n  is  even  and  the  matrix  M  is  invertible,  then  V  is 
the  null  space. 
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11.5  The  Lie  Algebra  s  o  (3) 

We  have  a  closer  look  at  the  Lie  algebra  $  o  (3)  introduced  in  the  Exercise  1 1 . 1 . 10.  As 
mentioned,  it  is  three  dimensional  and  generated  by  the  three  matrices 


which  are  closed  under  matrix  commutator. 

Consider  the  three  dimensional  euclidean  totally  antisymmetric  Levi-Civita  sym¬ 
bol  £aia2a3  with  indices  aj  =  1,  2,  3  and  defined  by 

1+1  if  (a i ,  <22,  <+)  is  an  even  permutation  of  (1 ,  2,  3) 

—  1  if  (+i,  a2,  a3)  is  an  odd  permutation  of  (1,  2,  3)  . 

0  if  any  two  indices  are  equal 

One  has  the  identity  XlLt  £abc^aks  -  (5bkScs  ~  Sbs5ck)- 

Excercise  11.5.1  Using  the  Levi-Civita  symbol,  it  is  easy  to  see  that  the  generators 
La  have  components  given  by 


'00  0 
Li=  (00-1 
0  1  0 


u  = 


(La) 


mn 


=  £ 


man  ? 


while  their  commutators  are  written  as 


\Lm  ,  Lyi\  -  ^  ^ 


<2=1 


£ 


mna 


There  is  an  important  subtlety  when  identifying  3x3  antisymmetric  matrices 
with  three  dimensional  vectors.  The  most  general  antisymmetric  matrix  in  indeed 
characterised  by  three  scalars, 


(0  -v3  v2  \  3 

V3  0  -Vi  I  = 

—  V2  Vi  0  J  a=  1 

For  the  time  being,  this  only  defines  a  triple  of  numbers  (iq,  sv 2, 1^3)  in  E3 . 
Whether  this  triple  provides  the  components  of  a  vector  in  the  three  dimensional 
euclidean  space,  will  depend  on  how  it  transforms  under  an  orthonormal  transforma¬ 
tion.  Now,  we  may  think  of  A  as  the  matrix,  with  respect  to  the  canonical  orthonormal 
basis  £  of  a  skew-adjoint  endomorphism  </>  on  E5  \  A  =  M ^  ’  .  When  changing  basis 

to  an  orthonormal  basis  J3  with  matrix  of  change  of  basis  R  =  e  0(3),  the 

matrix  A  is  transformed  to 
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A'  =  RAR~l  =  RA'R, 


since  R  is  orthogonal  and  thus  R  1  =  lR.  Since  A'  is  antisymmetric  as  well,  it 
can  be  written  as  A'  =  ^  =  for  some  (v\ ,  v2,  v'3).  In  order  to  establish  the 

transformation  rule  from  (v\,  i>2,  113)  to  ( v[ ,  v2,  v'3),  we  need  an  additional  result  on 
orthogonal  matrices. 

Excercise  11.5.2  Using  the  expression  in  Sect.  5.3  for  the  inverse  of  an  invertible 
matrix,  the  orthogonality  condition  for  a  matrix  R  e  0(3),  that  is 
Rab  =  {,R)ba  =  ( R~l)ba ,  can  be  written  as 

Rab=  A—(-iy+b  det(Rab), 
det  R 

where  R ^  is  the  2  dimensional  matrix  obtained  by  deleting  the  row  a  and  the  column 
b  in  the  3  dimensional  matrix  R.  (Then  det (Rab  is  the  minor  of  the  element  Rab ,  see 
the  Definition  5.1.7.)  In  terms  of  the  Levi-Civita  symbol  this  identity  transform  to 


3 

E 

7  =  1 


£ mjnRjq  — 


1 


det  R 


^  '  Rma^  aqb  Rnb  •> 


a,b—l 


(11.3) 


or,  being  *R  orthogonal  as  well,  with  det  R  =  det T R, 


3  j  3 

^  '  £ mjnRqj  ~  ^  ^  ^  '  Ram^aqbRbn 


7  =  1 


(11.4) 


a,b=  1 


Going  back  to  A  =  ^Li  vaLa  and  A'  =  jj„b=i  v'aL we  have  for  their  com¬ 
ponents: 


Amn  —  ^  ^ 
7  =  1 


H  j  jn 


and  K,n  =  E  v'j 

j= i 


&mjn  • 


We  then  compute,  using  the  relation  (11.3), 


^mn  —  ( RA  R\nn  —  ^  '  Rma-^ab  Rnb 

a,b=l 

3  3 

=  EE  H  j  & ajb  R  ma  Rnb 

7  =  1  a,b=  1 

3  3 


=  (det  R) 


RCj  H  j  ^  in  cn 


=  (det  R)  J2  (Rv), 


c  &mcn  ? 


7  =  1  c=l 


c—  1 
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that  is, 


3 

v'j  =  (det  R)  (Rv) j  =  (det  7?)  RcjVj. 

c—  1 


This  shows  that,  under  an  orthogonal  transformation  between  different  bases  of 
E 3,  the  components  of  an  antisymmetric  matrix  transforms  as  the  components  of  a 
vector  only  if  the  orientation  is  preserved,  that  is  only  if  the  transformation  is  special 
orthogonal. 

Using  a  terminology  from  physics,  elements  in  E 3  whose  components  with  respect 
to  orthonormal  basis  transform  as  the  general  theory  (see  the  Proposition  7.9.2)  pre¬ 
scribes  are  called  polar  vectors  (or  vectors  tout  court),  while  elements  in  E3  whose 
components  transform  as  the  components  of  an  antisymmetric  matrix  are  called  axial 
(or  pseudo )  vectors. 

An  example  of  an  axial  vector  is  given  by  the  vector  product  in  E3  of  two  (polar) 
vector,  that  we  recall  from  the  Chap.  1.  To  be  definite,  let  us  start  with  the  canonical 
orthonormal  basis  £.  If  v  =  V2,  i^andu;  =  (w i,  ua, ua),  the  Proposition  1.3.15 

define  the  vector  product  of  v  and  w  as, 


r(v,  w)  =  V  A  W  =  (V2W3  —  V3W2,  V3W1  —  V\ U)3,  V\W2  —  V2W1). 


Using  the  Levi-Civita  symbol,  the  components  are  written  as 


(V  A  W)a 


3 

^  ^  &abc  Vb  Wc- 
b,c=  1 


If  R  =  Ms,t3  e  0(3)  is  the  change  of  basis  to  a  new  orthonormal  basis  B  for  £3,  on 
one  hand  we  have  (da  w)'  =  (R(v  A  w))q  while  the  relation  (11.4)  yields, 


(f  A  W  )q  —  ^  ^  ^qkj 


VkW'j 


k,j= 1 


—  ^  ^  ^qkj  RkbRjs^b^s 

k,j,b,s= 1 

3 

=  (det  R )  RqaSabs  vbws  =  (det  R)(v  A  w) 

a,b,s= 1 


q 


This  shows  that  the  components  of  a  vector  product  transforms  as  an  axial  vector 
under  an  orthogonal  transformation  between  different  bases  of  E3 .  In  a  similar  man¬ 
ner  one  shows  that  the  vector  product  of  an  axial  vector  with  a  polar  vector,  is  a  polar 
vector. 

Excercise  11.5.3  For  example,  the  change  of  basis  from  B  to  B'  =  (b\  =  —b\, 
b'r?  =  —  Z?2,  b'3  =  —bs)  is  clearly  represented  by  the  matrix  MB,B  =  MB  ,B  =  —I3 
which  is  orthogonal  but  not  special  orthogonal.  It  is  immediate  to  see  that  we  have 
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v  =  (—v\,  —V2,  —V3  )b>  and  w  =  (—w\,  —W2,  —  v^3)ba  but  v  aw  =  (V2W3  —  V3W2, 

V3W1  —  V\W3,  V\W2  ~  V2W\)]3'. 

From  the  Example  1.3.17  we  see  that,  since  the  physical  observables  position,  veloc¬ 
ity,  acceleration  and  force  are  described  by  polar  vectors,  both  momenta  and  angular 
momenta  for  the  dynamics  of  a  point  mass  are  axial  vectors. 

Excercise  11.5.4  We  recall  from  Sect.  1.4  the  action  of  the  operator  rot  on  a  vector 
field  A  (x), 

3 

rot  A  =  V  A  A  =  ^  (eijkdjAk)  et 

i,j,k=  1 

with  respect  to  an  orthonormal  basis  £  =  (e\,  £2,  £3)  of  E 3  which  represents  the 
physical  space  S.  This  identity  shows  that,  if  A  is  a  polar  vector  (field)  then  rot  A  is 
an  axial  vector  (field). 

Example  11.5.5  The  (Lorentz)  force  F  acting  on  a  point  electric  charge  q  whose 
motion  is  given  by  x(t),  in  the  presence  of  an  electric  field E(x)  and  a  magnetic  field 
B(x)  is  written  as 

F  =  q(E  +  x  A  B). 

We  conclude  that  E  is  a  polar  vector  field,  while  B  is  an  axial  vector  field.  Indeed, 
the  correct  way  to  describe  B  is  with  an  antisymmetric  3x3  matrix. 


11.6  The  Angular  Velocity 

When  dealing  with  rotations  in  physics,  an  important  notion  is  that  of  angular  veloc¬ 
ity.  This  and  several  related  notions  can  be  analysed  in  terms  of  the  spectral  prop¬ 
erties  of  orthogonal  matrices  that  we  have  illustrated  above.  It  is  worth  recalling 
from  Chap.  1  that  euclidean  vector  spaces  with  orthonormal  bases  are  the  natural 
framework  for  the  notion  of  cartesian  orthogonal  coordinate  systems  for  the  physical 
space  S  ( inertial  reference  frames). 

Example  11.6.1  Consider  the  motion  x(t)  in  E3  of  a  point  mass  such  that  its  distance 
||x(t)||  from  the  origin  of  the  coordinate  system  is  fixed.  We  then  consider  a  fixed 
orthonormal  basis  £  =  (e\,  ^2,  £3),  and  a  orthonormal  basis  £'  =  (e[(t),  e'2(t) ,  e'3(t)) 
which  rotates  with  respect  to  £  in  such  a  way  that  the  components  of  x(t)  along  £' 
do  not  depend  on  time  —  the  point  mass  is  at  rest  with  respect  to  £'.  We  can  write 
the  position  vector  x(t)  as 


3  3 

x(o  =  xa(t)ea  =  E  44(0- 

<2=1  k—  1 
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Since  S'  depends  on  time,  the  change  of  the  basis  is  given  by  a  time-dependent 
orthogonal  matri xMs,s'^  =  R(t )  e  SO(3)  as 

3 

Xk(t)  =  e  Rkj(t)x'j. 

7=1 

By  differentiating  with  respect  to  time  t  (recall  that  the  dot  means  time  derivative), 
with  x'j  =0,  the  above  relation  gives, 

3  3  3 

%k  —  ^  '  RgXg  =  y  '  Rka(R  )ab^b  =  ^  '  Rka(  R)ab%b  • 

ci—  1  a.b=  1  a  ,b—  1 

From  the  relation  R(t)‘R(t)  =  It,  it  follows  that,  by  differentiating  with  respect  to  f, 

R'R  +  R'R  =  0  =>■ 

R'R  =  -R('R)  =  -  ‘(R'R) 


We  see  that  the  matrix  RTR  is  antisymmetric,  so  from  the  Exercise  1 1 . 1 . 1 1  there 
exist  real  scalars  a;2(0>  ^3  (0)  such  that 

/  0  -l j3(t)  u2(t)  \ 

RtR  =  w3(t)  0  — uj\  (t )  .  (11.5) 

\-^2(0  ^t  (0  o  / 

A  comparison  with  the  Example  1.3. 17  than  shows  that  the  expression  for  the  velocity, 


*2 


/  o  c u2(t)  \  AA 

I  w3(t)  0  -U\(t)  I  v2  , 

\-^2(0  (0  o  /  \x3J 


can  be  written  as 


X(^)  =  Uj(t)  A  X(t). 


(11.6) 


The  triple  cj(t)  =  (u>\ (t),  uo2 (t),  (t))  is  the  angular  velocity  vector  of  the  motion 

described  by  the  rotation  R (t). 

As  we  shall  see  in  the  Exercise  11.7.1,  this  relation  also  describes  the  rotation  of 
a  rigid  body  with  a  fixed  point. 

Excercise  11.6.2  The  velocity  corresponding  to  the  motion  in  E3  given  by  (here 
r  >  0) 

x(t)  =  (r  cos  a(t),  r  sin  a  (0,0) 
with  respect  to  an  orthonormal  basis  £  is 
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x(t)  =  d(  —  r  sin  a(t),  r  cos  a(t),  0)  =  cj(t)  A  x(t ) 


with  uo(t)  =  (0,  0,  a). 


From  the  Sect.  11.5,  we  know  that  the  angular  velocity  is  an  axial  vector,  so  we 
write 


UJa(t)  u'b(t )  =  (det  P)  ^2  PabUa(t). 


a— 1 


for  the  transformation  of  the  components  under  a  change  of  basis  in  E 3  given  by 
an  orthogonal  matrix  P  e  0(3).  Notice  that  the  relation  (1 1.6)  shows  that  the  vector 
x(t),  although  expressed  via  an  axial  vector,  is  a  polar  vector,  since  the  vector  product 
between  an  axial  vector  and  a  polar  vector  yields  a  polar  vector.  This  is  consistent 
with  the  formulation  of  x(t)  as  the  physical  velocity  of  a  point  mass. 

A  different  perspective  on  these  notions  and  examples,  allows  one  to  study  how 
the  dynamics  of  a  point  mass  is  described  with  respect  to  different  reference  systems, 
in  physicists’  parlance. 

Example  11.6.3  We  describe  the  motion  of  a  point  mass  with  respect  to  an  orthonor¬ 
mal  basis  £  =  (ei,  e2,  e 3)  and  with  respect  to  an  orthonormal  basis  £\t)  = 
e'2(t),  e'3(t))  that  rotates  with  respect  to  £.  So  we  write 

3  3 

x(f)  =  ^2,xa(t)ea  =  V  x'k  (t)e'k(t). 

a— 1  k=  1 


Considering  the  time  derivative  of  both  sides,  we  have 

33  3 

x(0  =  =  VA(04(f)  +  Xa  Ar)- 

a— 1  k=  1  k=  1 


Using  the  results  of  the  Example  1 1.6.1,  the  second  term  can  be  written  by  means  of 
an  angular  velocity  co(t)  and  thus  we  have 

x(t)  =  x'(t)  +  uj(t)  A  x\t), 

where  v  =  x  is  the  velocity  of  the  point  mass  with  respect  to  £,  while  v;  =  x'  is  the 
velocity  of  the  point  mass  with  respect  to  £'(t). 

With  one  step  further  along  the  same  line,  by  taking  a  second  time  derivative 
results  in 


X(t)  =  X'(t)  +  L d(t)  A  X'(t)  +  Uj(t)  A  (x'(t)  +  Uj(t)  Ax'(O)  +  &(t)  Ax'(f) 
=  x'(^)  +  2  L u(t)  A  X'(^)  +  U u(t)  A  (i u(t)  A  x'(0)  +  &(t)  A  x' (t) . 
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Using  the  language  of  physics,  the  term  x'(t)  is  the  acceleration  of  the  point  mass 
with  respect  to  the  ‘observer’  at  rest  £,  while  x'(t)  gives  its  acceleration  with  respect 
to  the  moving  ‘observer’  £' it). 

With  the  rotation  of  £'(t)  with  respect  to  £  given  in  terms  of  the  angular  veloc¬ 
ity  the  term 

ac  =  2 cu(t)  a  x'(t) 

is  called  the  Coriolis  acceleration,  the  term 

aR  =  u(t)  a  (c o(t)  a  x'(0) 

is  the  radial  (that  is  parallel  to  x'(t))  acceleration,  while  the  term 

aT  =  uj(t)  a  x'(t ) 

is  the  tangential  (that  is  orthogonal  to  x'(t ))  one,  and  depending  on  the  variation  of 
the  angular  velocity. 


11.7  Rigid  Bodies  and  Inertia  Matrix 

Example  11.7.1  Consider  a  system  of  point  masses  {m(j)}j=\ . n  whose  mutual 

distances  in  E 3  is  constant,  so  that  it  can  be  considered  as  an  example  of  a  rigid 
body.  The  dynamics  of  each  point  mass  is  described  by  vectors  X(p(t). 

If  we  do  not  consider  rigid  translations,  each  motion  X(j)(t)  is  a  rotation  with 
the  same  angular  velocity  u (t)  around  a  fixed  point.  If  we  assume,  with  no  loss  of 
generality,  that  the  fixed  point  coincides  with  the  centre  of  mass  of  the  system,  and 
we  set  it  to  be  the  origin  of  E3,  then  the  total  angular  momentum  of  the  system  (the 
natural  generalization  of  the  angular  momentum  defined  for  a  single  point  mass  in 
the  Example  1.3.17)  is  given  by  (using  (11.6)) 

N  N 

L (0  =  EaojXoVO  A  X0)(f)  =  y>0,x0,(r)  a  (w(f)  a  XW)(0). 
j= i  j= i 

With  an  orthonormal  basis  £  =  e 2,  £3)  for  E 3,  so  that  x^  =  (v^i,  V(7)2,  ^0)3) 

and  using  the  definition  of  vector  product  in  terms  of  the  Levi-Civita  symbol,  it  is 
straightforward  to  compute  that  L  =  (Li,  L2,  L3)  is  given  by 

3  N 

Lk  —  |  m0')(llx0')  II  $ks  ~  j 
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(In  order  to  lighten  notations,  we  drop  for  this  example  the  explicit  t  dependence  on 
the  maps.)  This  expression  can  be  written  as 


L 


k 


3 

^  '  Iks  ^ s 

s—\ 


where  the  quantities 


N 

Iks  —  7>a-)(llX0-)ll  $ks  ~  x(j)kx(j)s) 

j= 1 

are  the  entries  of  the  so  called  inertia  matrix  X  (or  inertia  tensor)  of  the  rigid  body 
under  analysis. 

It  is  evident  that  the  inertia  matrix  is  symmetric,  so  from  the  Proposition  10.5.1, 
there  exists  an  orthonormal  basis  for  E3  of  eigenvectors  for  it.  Moreover,  if  A  is  an 
eigenvalue  with  eigenvectors,  we  have 

3  N 

AIM2  =  IksUkUs  =  y^m0)(||M||2||x0)||2  -  ( u  -x0))2)  >  0 

k,s= 1  j= 1 

where  the  last  relation  comes  from  the  Schwarz  inequality  of  Proposition 3. 1.8. 
This  means  that  X  has  no  negative  eigenvalues.  If  (u\,U2,  S3)  is  the  orthonormal 
basis  for  which  the  inertia  matrix  is  diagonal,  and  (Ai ,  A2,  A3)  are  the  corresponding 
eigenvalues,  the  vector  lines  C(ua)  are  the  so  called  principal  axes  of  inertia  for  the 
rigid  body,  while  the  eigenvalues  are  the  moments  of  inertia. 

We  give  some  basic  examples  for  the  inertia  matrix  of  a  rigid  body. 

Excercise  11.7.2  Consider  a  rigid  body  given  by  two  point  masses  with 
m( i)  =  am  (2)  =  am  with  a  >  0,  whose  position  is  given  in  E 3  by  the  vectors 
X(!)  =  (0,  0,  r)  and  X(2)  =  (0,  0,  —  ar)  with  r  >  0.  The  corresponding  inertia  matrix 
is  found  to  be 

(10°) 

X  =  a(\  +  a)mr 2  I  0  1  0  I  . 

\0  0  0/ 

The  principal  axes  of  inertia  coincide  with  the  vector  lines  spanned  by  the  orthonormal 
basis  E.  The  rigid  body  has  two  non  zero  momenta  of  inertia;  the  third  momentum 
of  inertia  is  zero  since  the  rigid  body  is  one  dimensional. 

Consider  a  rigid  body  given  by  three  equal  masses  =  m  and 

X(1)  =  (r,  0,  0),  X(2)  =  l(-r,  V3r,  0),  x(3)  =  h-r,  -73 r,  0) 
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with  r  >  0,  with  respect  to  an  orthonormal  basis  £  in  E3 .  The  inertia  matrix  is 
computed  to  be 


1  = 


3  mr 


so  the  basis  elements  £  provide  the  inertia  principal  axes. 

Finally,  consider  a  rigid  body  in  E 3  consisting  of  four  point  masses  with  =  m 

and 


x(i)  =  (r,  0,  0),  x(2)  =  (-r,  0,  0),  x(3)  =  (0,  r,  0),  x(4)  =  (0,  -r,  0) 

with  r  >  0.  The  inertia  matrix  is  already  diagonal  with  respect  to  £  whose  basis 
elements  give  the  principal  axes  of  inertia  for  the  rigid  body,  while  the  momenta  of 
inertia  is 

/io°\ 

X  =  2  mr2  0  10. 

\0  0  2/ 


Chapter  12 

Spectral  Theorems  on  Hermitian  Spaces 


® 

Check  for 
updates 


In  this  chapter  we  shall  extend  to  the  complex  case  some  of  the  notions  and  results 
of  Chap.  10  on  euclidean  spaces,  with  emphasis  on  spectral  theorems  for  a  natural 
class  of  endomorphisms. 


12.1  The  Adjoint  Endomorphism 

Consider  the  vector  space  Cn  and  its  dual  space  C"*,  as  defined  in  Sect.  8.1.  The 
duality  between  Cn  and  C”*  allows  one  to  define,  for  any  endomorphism  <f>  of  Cn , 
its  adjoint. 

Definition  12.1.1  Given  <p  :  Cn  —>  Cn ,  the  map  <pf  :  uj  e  Cw*  i->  <p\oo)  e  Cw* 
defined  by 

(4>\uMv)  =  co(m)  (12.1) 

for  any  a;  g  C"*  and  any  v  e  Cn  is  called  the  adjoint  to  <p. 

Remark  12.1.2  From  the  linearity  of  and  u u  it  follows  that  <fP  is  linear,  so  <pf  e 
End(C"*). 

Example  12.1.3  Let  B  =  (b\,  A)  be  a  basis  for  C2,  with  B*  =  (A ,  A)  its  dual  basis 
for  C2*.  If  (j)  is  the  endomorphism  given  by 

(p  !  b\  1 — ^  kb\  T-  ^2 

p  \  b 2  1— ^  b2 , 
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with  k  G  C,  we  see  from  the  definition  of  adjoint  that 


<P\0\) 

b\ 

i-^ 

=  k 

0f(/3i) 

hi 

i-^ 

PiWbi)) 

=  0 

0f(/?2) 

b\ 

i-^ 

PiWbi)) 

=  1 

hi 

i-^ 

m(b2)) 

=  1. 

The  (linear)  action  of  the  adjoint  map  to  </>  is  then 

:  (3\  i->  k(3\ 

(j)^  '  /3 2  i— (3 1  +  /?2* 

Consider  now  the  canonical  hermitian  space  Hn  =  ( C" ,  •)»  that  is  the  vector  space 
C77  with  the  canonical  hermitian  product  (see  Sect.  3.4).  As  described  in  Sect.  8.2, 
the  hermitian  product  allows  one  to  identify  C77*  with  C77 .  Under  such  identification, 
the  defining  relation  for  (ft  can  be  written  as 

(<; ftu )  •  v  =  u  •  (cj)v)  or  equivalently  (ft  (u)  |  v)  =  (u\(f)(v)} 

for  any  u,  v  e  Cn,  so  that  ft  is  an  endomorphism  of  Hn  =  ( C77 ,  •)• 

Definition  12.1.4  Given  a  matrix  A  =  (aij )  g  C77’77,  its  adjoint  Af  g  C77’77  is  the 
matrix  whose  entries  are  given  by  (A^)ab  =  oj^. 

Thus,  adjoining  a  matrix  is  the  composition  of  two  compatible  involutions,  the 
transposition  and  the  complex  conjugation. 

Exercise  12.1.5  Clearly 


Exercise  12.1.6  By  using  the  matrix  calculus  we  described  in  the  previous  chapters, 
it  comes  as  no  surprise  that  the  following  relations  hold. 

(At)t  =  A, 

(AS)1-  =  fifAf, 

(A  +  aBY  =  (Af  +aBf) 


for  any  A,  B  e  Cn,n  and  a  e  C.  The  second  line  indeed  parallels  the  Remark  8.2.1. 
If  we  have  two  endomorphisms  G  End (Hn),  one  has 


((<f)  4>y  (u)\v)  =  (u \4>ip(v))  =  {4>\u)\ip{v))  =  cp\u)\v) , 
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for  any  u,  v  e  Hn.  With  a  e  C,  it  is  also 

((</>  +  a  ipfu | v)  =  {u\((j)  +  a  ip)v)  =  (<j>{u)\v)  +  a(f(u)\v)  =  {{(jf  +  OL'jf  )(u)\v). 


Again  using  the  properties  of  the  hermitian  product  together  with  the  definition 
of  adjoint,  it  is 

((^yulv)  =  (u\<p'(v))  =  {<t>(u)\v) 

The  above  lines  establish  the  following  identities 

W  =  0, 

(0^)t  = 

{(j) -\- a  =  (ft  +  aft 

which  are  the  operator  counterpart  of  the  matrix  identities  described  above. 
Definition  12.1.7  An  endomorphism  0  on  Hn  is  called 

(a)  self-adjoint ,  or  hermitian ,  if 

<p  =  4>\ 

that  is  if  (cj)(u)\v)  =  (u\cj)(v))  for  any  u,  v  e  Hn, 

(b)  unitary ,  if 

4>y  =  y<t> = r, 

that  is  if  (cj)(u)\(j)(v))  =  ( u\v )  for  any  u,  v  e  Hn , 

(c)  normal ,  if  (jxjf  =  (jf  <j>. 

In  parallel  to  these,  a  matrix  A  e  Cn,n  is  called 

(a)  self-adjoint ,  or  hermitian ,  if  Af  =  A, 

(b)  unitary ,  if  AAf  =  AfA  =  In, 

(c)  normal ,  if  AAf  =  AfA. 

Remark  12.1.8  Clearly  the  condition  of  unitarity  for  f  is  equivalent  to  the  condition 
cjf  =  cj)~l .  Also,  both  unitary  and  self-adjoint  endomorphisms  are  normal.  From  the 
Remark  12.1.6  it  follows  that  for  any  endomorphism  the  compositions  and 
are  self-adjoint. 

Remark  12.1.9  The  notion  of  adjoint  of  an  endomorphism  can  be  introduced  also 
on  euclidean  spaces  En ,  where  it  is  identified,  at  a  matrix  level,  by  the  transposition. 
Then,  it  is  clear  that  the  notion  of  self-adjointness  in  Hn  generalises  that  in  En ,  since 
if  A  =  TA  in  En ,  then  A  =  Af  in  Hn ,  while  orthogonal  matrices  in  En  are  unitary 
matrices  in  Hn  with  real  entries. 

The  following  theorem  is  the  natural  generalisation  for  hermitian  spaces  of  a 
similar  result  for  euclidean  spaces.  Its  proof,  that  we  omit,  mimics  indeed  that  of  the 
Theorem  10.1.11. 
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Theorem  12.1.10  Let  C  be  an  orthonormal  basis  for  the  hermitian  vector  space  Hn 
and  let  B  be  any  other  basis.  The  matrix  MC'B  of  the  change  of  basis  from  C  to  B  is 
unitary  if  and  only  ifB  is  orthonormal. 

The  following  proposition,  gives  an  ex-post  motivation  for  the  definitions  above. 

Proposition  12.1.11  If  £  is  the  canonical  basis  for  Hn,  with  f  e  End  (Hn),  it  holds 
that 

m£/  =  (M£’£)  t 

Proof  Let  £  =  (e\,  . . . ,  en)  be  the  canonical  basis  for  Hn .  If  e  Cn,n  is  the 
matrix  that  represents  the  action  of  f  on  Hn  with  respect  to  the  basis  £,  its  entries 
are  given  (see  8.7)  by 

(M£'£)ab  =  (ea\<j>(eb)) . 

By  denoting  4>ab  =  (Mrr  )ab,  the  action  of  <p  is  given  by  &(ea)  =  Yl=\  4>baeb,  so 
we  can  compute 

n 

^ab  ~  {^a\4^  ^Tb))  ~  [f^Ta)\^b)  ~  ^  ^J{fca^c\^b)  ~  fba 

c—i 


As  an  application  of  this  proposition,  the  next  proposition  also  generalises  to 
hermitian  spaces  analogous  results  proven  in  Chap.  10  for  euclidean  spaces. 

Proposition  12.1.12  The  endomorphism  f  on  Hn  is  self-adjoint  ( resp.  unitary,  resp. 

normal )  if  and  only  if  there  exists  an  orthonormal  basis  B  for  Hn  with  respect  to 

t3  13 

which  the  matrix  Mr])  ’  is  self-adjoint  (resp.  unitary,  resp.  normal ). 

Exercise  12.1.13  Consider  upper  triangular  matrices  in  C2,2, 


M  = 


One  explicitly  computes 


MMf  = 


aa  +  bb  be 
cb  bb  +  cc 


—  I  ? 


(aa  ba 

ab  bb  +  cc 


and  the  matrix  M  is  normal,  MM  f  =  M  '  M ,  if  and  only  if  bb  =  0  b  =  0.  Thus 
an  upper  triangular  matrix  in  2-dimension  is  normal  if  and  only  if  it  is  diagonal.  In 
such  a  case,  the  matrix  is  self-adjoint  if  the  diagonal  entries  are  real,  and  unitary  if 
the  diagonal  entries  have  norm  1 . 
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Exercise  12.1.14  We  consider  the  following  family  of  matrices  in  C2,2, 


M  = 


It  is 


MMf 


(aa  +  bb 
ca 


M'M 


The  conditions  for  which  M  is  normal  are 


bb  =  cc,  ac  —  ba. 

These  are  solved  by  b  =  Relf3 ,  c  =  Rexl ,  A  =  \A\eia  with  2a  =  (/3  +  7)  mod27r, 
where  R  >  0  and  \A\  >  0  are  arbitrary  moduli  for  complex  numbers. 

Exercise  12.1.15  With  the  Dirac’s  notation  as  in  (8.6),  an  endomorphism  <fi  and  its 
adjoint  are  written  as 


n  n 

4  =  ^  '  4ab  \^a){^b\  and  (f)^  =  ^  '  4ba  \^a){^b\ 

a,b—  1  a,b=  1 

c  c 

with  4>ab  =  (ea\ 4(eb))  =  (M0 ’  )ab  with  respect  to  the  orthonormal  basis 

S’  —  {C\  ,  .  •  .  ,  Cyi)  . 

With  u  =  (u\,  •  •  •  ,  un)  and  v  =  (v\,  . . . ,  vn)  vectors  in  Hn  we  have  the  endo¬ 
morphism  P  =  |  u)  (u|.  If  we  decompose  the  identity  endomorphism  (see  the  point 
(c)  from  the  Proposition  10.3.7)  as 


n 

id  =  |es)(ei 

.S  =  1 


we  can  write 


n  n 

^  '  I  £a)  {&a  1^)  \^b)  {&b  I  ~  ^  '  Pab  \^a){^b\ 

ab=\  ab=  1 


with  Pab  =  uavb  =  (ea\P(eb)).  Clearly  then 

=\v)(u\. 
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c  c 

Example  12.1.16  Let  f  an  endomorphism  Hn  with  matrix  M0  ’  with  respect  to  the 

c  c 

canonical  orthonormal  basis,  thus  (M0  ’  )ab  =  {ea\<t>(eb))  ■  If  B  =  (bu  ...,bn)  is  a 
second  orthonormal  basis  for  Hn ,  we  have  two  decompositions 

n  n 

id  =  ^2  \ek)(ek\  =  E 

k—  1  s—  1 


Thus,  by  inserting  these  two  expressions  of  the  identity  operators,  we  have 

n 

(ea\ 4>(eb))  =  ^  (ea\bk){bk\(f>(bs))(bs\eb), 

k,s—l 


giving  in  components, 


(M£/)ab  = 

k,s— 1 

The  matrix  of  the  change  of  basis  from  S  to  B  has  entries  ( ea \bk)  =  (Ms,t3)ak,  with 
its  inverse  matrix  entries  given  by  (Mb,s)si?  =  (bs  \et).  From  the  previous  examples 
we  see  that 


(MB'£\k  =  (MB-£)ka  =  (bk \ea)  =  (ea\ bk)  =  ( M£'B)ak 

thus  finding  that  the  change  of  basis  is  given  by  a  unitary  matrix. 

Proposition  12.1.17  For  any  endomorphism  in  Hn,  there  is  an  orthogonal  vector 
space  decomposition 

Hn  =  Im((/>)  0  ker(0^) 

Proof  If  u  is  any  vector  in  Hn ,  the  vector  cover  over  all  of  Im (</>),  so  the 
condition  (cj)(u)\w)  =  0  characterises  the  elements  w  e  (Im It  is  now  easy  to 
compute 

0  =  (<j)(u)\w)  =  ( u\cj)\w )). 


Since  u  is  arbitrary  and  the  hermitian  product  is  not  degenerate,  we  have 
ker  (<jf)  =  (ImO/O)-1.  □ 
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12.2  Spectral  Theory  for  Normal  Endomorphisms 

We  prove  a  few  results  for  normal  endomorphisms  which  will  be  useful  for  spectral 
theorems. 

Proposition  12.2.1  Let  <fi  be  a  normal  endomorphism  of  Hn. 

(a)  With  u  e  Hn,  we  can  write 

||0O)||2  =  (00)100))  =  010*00)}  =  OI#*0)>  =  (0*  O)I0*O)>  =  \\(f{u)\\2. 


Since  the  order  of  these  computations  can  he  reversed,  we  have  the  following 
characterisation. 

f(jf  =  (ff  f  ||0(w)||  =  ||0*(w)||  for  all  u  G  Hn . 

(b)  From  this  it  also  follows  that  ker  (</>)  =  ker  (cjf ).  So  from  the  Proposition  12.1.1 7, 
we  have  the  following  orthogonal  decomposition, 

Hn  =  Im (</>)  0  ker (</>). 

(c)  Clearly  —  XI)  is  a  normal  endomorphism  if  <fi  is  such.  This  gives 
ker(</>  —  XI)  =  ker(<^  —  XI),  meaning  that  if  X  is  an  eigenvalue  of  a  normal 
endomorphism  <fi,  then  X  is  an  eigenvalue  for  (jf ,  with  the  same  eigenspaces. 

(d)  Let  A,  p  be  two  distinct  eigenvalues  for  f,  with  f(v)  =  Xv  and  f(w)  =  pw. 
Then  we  have 

(A  —  p)(v\w)  =  (Xv\w)  —  ( v\pw )  =  (cf)f  (v)\w)  —  (u|  4>(w))  =  0. 

We  can  conclude  that  the  eigenspaces  corresponding  to  distinct  eigenvalues  for 
a  normal  endomorphism  are  mutually  orthogonal.  □ 

We  are  ready  to  characterise  a  normal  operator  in  terms  of  its  spectral  properties. 
The  proof  of  the  following  result  generalises  to  hermitian  spaces  the  proof  of  the 
Theorem  10.4.5  on  the  diagonalization  of  symmetric  endomorphisms  on  euclidean 
spaces. 

Theorem  12.2.2  An  endomorphism  of  Hn  is  normal  if  and  only  there  exists  an 
orthonormal  basis  for  Hn  made  of  eigenvectors  for  f. 

Proof  If  B  =  (b\ ,  ...  ,bn)  is  an  orthonormal  basis  of  eigenvectors  for  </>,  with  corre¬ 
sponding  eigenvalues  (Ai, . . . ,  Xn),  we  can  write 

n  n 

<t>  =  72^a  i *«><*« i  and  ^ 

(2  =  1  (2  =  1 
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which  directly  yields  (see  the  Exercise  12.1.15) 

n 

4 \  b“)^ba\  =  $4>- 

a— 1 


The  converse,  the  less  trivial  part  of  the  statement,  is  proven  once  again  by  induc¬ 
tion. 

Consider  first  a  normal  operator  <p  on  the  two  dimensional  hermitian  space  H2 . 
With  respect  to  any  basis,  the  characteristic  polynomial  p^iT)  has  two  complex 
roots,  from  the  fundamental  theorem  of  algebra.  A  normal  endomorphism  of  H2 
with  only  the  zero  eigenvalue,  would  be  the  null  endomorphism.  So  we  can  assume 
there  is  a  root  A  ^  0,  with  v  a  (normalised)  eigenvectors,  that  is  ftv)  =  Xv  with 
||  n  ||  =  1.  If  C  =  (v,  w)  is  an  orthonormal  basis  for  H2  that  completes  v,  we  have, 
from  point  (c)  above, 


(ftw)\v)  =  (w  |^(n))  =  {w\v)X  =  0. 

Being  A  ^  0,  this  shows  that  ftw)  is  orthogonal  to  so  that  there  must  exists  a 
scalar  /i,  such  that  ftw)  =  pw.  In  turn  this  shows  that  if  0  is  a  normal  endomorphism 
of  H 2,  then  H 2  has  an  orthonormal  basis  of  eigenvectors  for  </>. 

Inductively,  let  us  assume  that  the  statement  is  valid  when  the  dimension  of  the 
hermitian  space  is  n  —  1 .  The  n  -dimensional  case  is  treated  analogously  to  what  done 
above.  If  is  a  normal  endomorphism  of  Hn,  its  characteristic  polynomial  pftT) 
has  at  least  a  non  zero  complex  root,  A  say,  with  v  a  corresponding  normalised 
eigenvector:  ftv)  =  An,  with  ||n||  =  1.  (Again,  a  normal  endomorphism  of  Hn  with 
only  the  zero  eigenvalue  is  the  null  endomorphism.)  We  have  Hn  =  V\  0  and  v 
can  be  completed  to  an  orthonormal  basis  C  =  (n,  w\, . . . ,  wn)  for  Hn .  If  w  e  V^ 
we  compute  as  above 


{(j)(w)\v)  ~  (w |0^(n))  =  (w\v)X  =  0. 

This  shows  that  <fi  maps  V^~  to  itself,  while  also  ft  maps  V^~  to  itself  since, 

(ft(w)\v)  =  (w\ftv)}  =  (w\v)X  =  0. 

The  restriction  of  to  is  then  a  normal  operator  on  a  (n  —  1)  dimensional 
hermitian  space,  and  by  assumption  there  exists  an  orthonormal  basis  (u\, ...  ,un-\) 
for  made  of  eigenvectors  for  <j).  The  basis  £  =  (n,  u\ , . . . ,  un-\)  is  an  orthonormal 
basis  for  Hn  of  eigenvectors  for  0.  □ 

Remark  12.2.3  Since  the  field  of  real  numbers  is  not  algebraically  closed  (and  the 
fundamental  theorem  of  algebra  is  valid  on  C),  it  is  worth  stressing  that  an  analogue 
of  this  theorem  for  normal  endomorphisms  on  euclidean  spaces  does  not  hold.  A 
matrix  A  e  W1^  such  that  ( TA )  A  =  A  (A),  needs  not  be  diagonalisable.  An  example 
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is  given  by  an  antisymmetric  (skew- adjoint,  see  Sect.  11.1)  matrix  A,  which  clearly 
commutes  with  rA ,  being  nonetheless  not  diagonalisable. 

We  showed  in  the  Remark  12. 1 .8  that  self-adjoint  and  unitary  endomorphisms  are 
normal.  Within  the  set  of  normal  endomorphisms,  they  can  be  characterised  in  terms 
of  their  spectrum. 

If  A  is  an  eigenvalue  of  a  self-adjoint  endomorphism  </>,  with  <j>(v)  =  An,  then 


An  =  cj)(v)  =  (jy  (v)  =  An 


and  thus  one  has  A  =  A.  If  A  is  an  eigenvalue  for  a  unitary  operator  </>,  with  </>(n)  =  An, 
then 


which  gives  |A|  =  1.  It  is  easy  to  show  also  the  converse  of  these  claims,  so  to  have 
the  following. 

Theorem  12.2.4  A  normal  operator  on  Hn  is  self-adjoint  if  and  only  if  its  eigenval¬ 
ues  are  real  A  normal  operator  on  Hn  is  unitary  if  and  only  if  its  eigenvalues  have 
modulus  1. 

As  a  corollary,  by  merging  the  previous  two  theorems,  we  have  a  characterisation 
of  self-adjoint  and  unitary  operators  in  terms  of  their  spectral  properties,  as  follows. 

Corollary  12.2.5  An  endomorphism  <fi  on  Hn  is  self-adjoint  if  and  only  if  its  spec¬ 
trum  is  real  and  there  exists  an  orthonormal  basis  for  Hn  of  eigenvectors  for  <j>.  An 
endomorphism  <j>  on  Hn  is  unitary  if  and  only  if  its  spectrum  is  a  subset  of  the  unit 
circle  in  C,  and  there  exists  an  orthonormal  basis  for  Hn  of  eigenvectors  for  <j>. 

Exercise  12.2.6  Consider  the  hermitian  space  //2,  with  £  =  (e\,  ef)  its  canonical 
orthonormal  basis,  and  the  endomorphism  <fi  represented  with  respect  to  £  by 

K’£  =  (-n  n)  with  aeR- 


This  endomorphism  is  not  diagonalisable  over  R,  since  it  is  antisymmetric  (see 
Sect.  11.1)  and  the  Remark  12.2.3.  Being  normal  with  respect  to  the  hermitian  struc¬ 
ture  in  H2,  there  exists  an  orthonormal  basis  for  H2  of  eigenvectors  for  f.  The 
eigenvalue  equation  is  p^iT)  =  T2  +  a2  =  0,  so  the  eigenvalues  are  A±  =  ±i a, 
with  normalised  eigenvectors  u±  given  by 


A±  =  ±ia  u± 


A(1’±i)£’ 


while  the  unitary  conjugation  that  diagonalises  the  matrix  M 


8,8 

0 


is  given  by 


1 

2 


0  a\ 
—a  0  J 


/i  a  0  \ 
\0  -i a)' 
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The  comparison  of  this  with  the  content  of  the  Example  12.1.16  follows  by  writing 
the  matrix  giving  the  change  of  basis  from  £  to  B  =  (u+,  U-)  as 

mB,£  i  fl  -i\  / <M+ki>  <w+k2)\ 

2  V1  i  /  \<M-ki>  (u-\e2 )J  ' 

We  next  study  a  family  of  normal  endomorphisms,  which  will  be  useful  when  con¬ 
sidering  the  properties  of  unitary  matrices.  The  following  definition  comes  naturally 
from  the  Definition  12.1.7. 

Definition  12.2.7  An  endomorphism  in  Hn  is  named  skew-adjoint  if 
(u\(j)(v))  +  (<j)\u)\v)  =  0  for  any  u,veHn.  A  matrix  A  e  Cn,n  is  named  skew- 
adjoint  if  A 1  =  —A. 

We  list  some  important  results  on  skew-adjoint  endomorphisms  and  matrices. 

(a)  It  is  clear  that  an  endomorphism  </>  on  Hn  is  skew-adjoint  if  and  only  if  there 

c  c 

exists  an  orthonormal  basis  £  for  Hn  with  respect  to  which  the  matrix  M ^  ’  is 
skew-adjoint. 

(b)  Skew-adjoint  endomorphisms  are  normal.  We  know  from  the  Proposition  12.2. 1 
point  (c),  that  if  A  is  an  eigenvalue  for  the  endomorphism  </>,  then  A  is  an  eigen¬ 
value  for  <jk  .  This  means  that  if  A  is  an  eigenvalue  For  a  skew-adjoint  endomor¬ 
phism  cj),  then  A  =  —  A,  so  any  eigenvalue  for  a  skew-adjoint  endomorphism  is 
either  purely  imaginary  or  zero. 

(c)  There  exists  an  orthonormal  basis  £  =  (e\,  ...  ,en)  of  eigenvectors  for  cj  such 
that 

n 

<p  =  E  with  Xa  €  R. 

a— 1 

(d)  The  real  vector  space  of  skew-adjoint  matrices  A  =  —A1  e  Cn,n  is  a  matrix 
Lie  algebra  (see  the  Definition  11.1.6),  that  is  the  commutator  of  skew-adjoint 
matrices  is  a  skew-adjoint  matrix;  it  is  denoted  u(n)  and  it  has  dimension  n. 

Remark  12.2. 8  In  parallel  with  the  Remark  11.1.9,  self-adjoint  matrices  do  not  make 
up  a  Lie  algebra  since  the  commutator  of  two  self-adjoint  matrices  is  a  skew-adjoint 
matrix. 

Exercise  12.2.9  On  the  hermitian  space  H 3  we  consider  the  endomorphism  <j>  whose 
representing  matrix  is,  with  respect  to  the  canonical  basis  £,  given  by 

/  0  i  a\ 

m£/  =  i  0  0, 

\  —a  0  0/ 
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with  a  a  real  parameter.  Since  (M|’  )f  =  —  ’  ,  then  is  skew-adjoint  (and  thus 

normal).  Its  characteristic  equation 

p0(r)  =  -ro  +  a2  +  r2)  =  o 

has  solutions  A  =  0  and  A±  =  ±iV  1  +  a2.  Explicit  calculations  show  that  the 
eigenspaces  are  given  by  ker (0)  =  Va=o  =  C(uo)  and  V\±  =  C(u±)  with 

“0  =  (0,  ia,  1), 

u±  =  .  1  (VI  +  a2,  ±1,  ±ia). 

\/  2(1 

It  is  immediate  to  see  that  the  set  B  =  (mo,  m±)  gives  an  orthonormal  basis  for 

H3. 

Exercise  12.2.10  We  close  this  section  by  studying  an  endomorphism  which  is  not 
normal,  and  indeed  diagonalisable  with  an  eigenvector  basis  which  is  not  orthonor¬ 
mal.  In  H 2  with  respect  to  £  =  (e\,  e 2),  consider  the  endomorphism  whose  repre¬ 
senting  matrix  is 


with  a  e  R.  Than  M  is  normal  if  and  only  if  a  =  1.  The  characteristic  equation  is 

Pm(T)  =  T2-a  =  0 
so  its  spectral  decomposition  is  given  by 

A±  =  ±^/a,  Vy±  =  C(u±)  with  u±  =  (1,  ±^fa)g. 

Being  (w+|w_)  =  l  —  a,  the  eigenvectors  are  orthogonal  if  and  only  if  M  is  normal. 


12.3  The  Unitary  Group 

If  A,  B  e  Cn,n  are  two  unitary  matrices,  Af  A  =  In  and  BB  =  In  (see  the  Definition 
12.1.7),  one  has  (AZ?)1  AB  =  B  A^AB  =  In.  Furthermore,  det(Af)  =  det(A),  so 
from  det(AA  f  )  =  1  we  have  |  det(A)|  =  1.  Clearly,  the  identity  matrix  In  is  unitary 
and  these  leads  to  the  following  definition. 

Definition  12.3.1  The  collection  of  n  x  n  unitary  matrices  is  a  group,  called  the 
unitary  group  of  order  n  and  denoted  U(n).  The  subset  SU (n)  =  {A  e  U (n)  : 
det(A)  =  1}  is  a  subgroup  of  U (n),  called  the  special  unitary  group  of  order  n. 
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Remark  12.3.2  With  the  the  natural  inclusion  of  real  matrices  as  complex  matrices 
whose  entries  are  invariant  under  complex  conjugation,  it  is  clear  that  O(n)  is  a 
subgroup  of  U(n)  and  SO (n)  is  a  subgroup  of  SU (n). 


Now,  the  exponential  of  a  matrix  as  in  the  Definition  11.2.1  can  be  extended  to 
complex  matrices.  Thus,  for  a  matrix  A  e  Cn,n ,  its  exponential  is  defined  by  by  the 
expansion, 


oo 


k= 0 


Then,  all  properties  in  the  Proposition  11.2.2  have  a  counterpart  for  complex  matrices, 
with  point  (e)  there  now  reading  eA^  =  {eAf. 

Theorem  12.3.3  Let  M,U  e  Cn,n.  One  has  the  following  results. 

(a)  IfM 1  =  —My  theneM  e  U (n).  IfM 1  =  —M  and  tr(M)  =  0,  theneM  e  SU (n). 

(b)  Conversely,  ifUUf  =  In,  there  exists  a  skew -adjoint  matrix  M  =  —M'  such  that 
U  =  eM .  If  U  is  a  special  unitary  matrix,  there  exists  a  skew -adjoint  traceless 
matrix,  M  =  —Mf  with  tr(M)  =  0,  such  that  U  =  eM . 

Proof  Let  M  be  a  skew-adjoint  matrix.  From  the  previous  section  we  know  that  there 
exists  a  unitary  matrix  V  such  that  M  =  V  A  MV\  with  AM  =  diag(ipi ,  . . . ,  i  pn)  for 
pa  6  R.  We  can  then  write 


eM  —  eVAMvJ  _  y  eAM  y t 

with  eAM  =  dia g(em,  . . . ,  elpn).  This  means  that  eAu  is  a  unitary  matrix,  and  we 
can  conclude  that  the  starting  matrix  eM  is  unitary.  If  tr(M)  =  0,  then  eM  is  a  special 
unitary  matrix. 

Alternatively,  the  result  can  be  shown  as  follows.  If  M  =  —M\  then 

{eMy  =  eM'  =  e~M  =  (/)-*. 

This  concludes  the  proof  of  point  (a). 

Consider  then  a  unitary  matrix  U .  Since  U  is  normal,  there  exists  a  unitary  matrix 
V  such  that  U  =  V AjjV with  A u  =  diag(em ,  ... ,  el(fn),  where  elp>k  are  the  mod¬ 
ulus  1  eigenvalues  of  U.  Clearly,  the  matrix  A  v  can  be  written  as 

A^/  =  e5u 

with  5jj  =  diag(i^i,  . . . ,  \<pn)  =  — (^)  f .  This  means  that 

U  =  V  e5v  yf  =  eVSuV^ 


with  (V5vVW  =  -(V<5[/Vt).  If  U  e  SU (n),  then  one  has  tr(V5c/Vt)  =  0.  This 
establishes  point  (b)  and  concludes  the  proof.  □ 
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Exercise  12.3.4  Consider  the  matrix,  with  a ,  b  e  R, 


Its  eigenvalues  A  are  given  by  the  solutions  of  the  characteristic  equation 

pA(T)  =  (b  —  T)2  —  a2  =  (b  —  T  —  a)(b  -  T  +  a)  =  0. 

Its  spectral  decomposition  turns  out  to  be 

A  ±=b±a,  Vx±  =  £((  1,±1)). 

To  exponentiate  the  skew-adjoint  matrix  i A  we  can  follow  two  ways. 

•  By  normalising  the  eigenvectors,  we  have  the  conjugation  with  its  diagonal  form 

a  =  vaav\ 


fb  a\  _  i  /  1  l\  fb  —  a  0  \  t  (\  — 1\ 
\a  bj  ~  1  iy  \  0  b  +  a)72\\  1  ) 


so  we  have 


Vt  =  1  ^  | 

|  .  ei (b-a)  _|_  ei (a+b) 

. eKb~a )  _|_  £\ (a+b) 


i  (b—a) 

0  e 

i (b-a)  _|_  e\ (a+b) 


+  e 


i  (a+b) 


elb  cos  a  ielb  sin  a 
ielb  sin  a  elb  cos  a 


Notice  that  det(elA)  =  e2lb  =  eltr(A). 
•  By  setting 


A  =  A  +  B  = 


(0  a\  (b  0A 

0  J  +  \0bj 


we  see  that  A  is  the  sum  of  two  commuting  matrices,  since  B  =  Z?/2.  So  we  can 
write 


eiA  =  £i  (A+B)  =  eiAeiB_ 


Since  B  is  diagonal,  e]K  =  diag(cli,  e"h ).  Computing  as  in  the  Exercise  11.2.4 
we  have 


Alk  = 


A 


2k+\ 


0  {-\fia2k+r 
(— l)*i  a2k+l  0 


so  that 
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and 


elA  = 


cos  a  ism  a 
isina  cos  a 


elb  cos  a  ielb  sin  a 
ielb  sin  a  elb  cos  a 


Exercise  12.3.5  In  this  exercise  we  describe  how  to  reverse  the  construction  of  the 
previous  one.  That  is,  given  the  unitary  matrix 


we  determine  the  self-adjoint  matrix  A  =  A 1  such  that  U  =  exA .  Via  the  usual  tech¬ 
niques  it  is  easy  to  show  that  the  spectral  decomposition  of  U  is  given  by 

A±  =  “  = 1  with  VA±  =  £((1,  ±i)). 

\/l  T 


Notice  that  |  A±  |  =  1  so  we  can  write  A±  =  ellf>±  and,  by  normalising  the  eigenvectors 
for  U , 


U  =  VA  t/V1 


fe^~  0  \ 
\  0  ei{*>+ ) 


with  V^V  =  I2.  Since  A u  =  el6u  with  Su  =  S\j  =  dia g(<^_,  (p+),  we  write 

U  =  VeiSuV^  =eivSuVt  =  eiA 


where  A  =  Af  with 


A  =  VSuV 1 


/, e 0  \ 

V  0  e^+J 


if  (-P-+(P+  ~  <P+)\ 

2  v^- -  <?+)  ^-  +  / 


Notice  that  the  matrix  A  is  not  uniquely  determined  by  U ,  since  the  angular 
variables  cp±  are  defined  up  to  2tt  periodicity  by 


a 


COS  (f±  = 


\/l  +  a2 


sin(^±  =  zb 


1 


Vl  +  a2 


We  close  this  section  by  considering  one  parameter  groups  of  unitary  matrices. 
We  start  with  a  self-adjoint  matrix  A  =  A1  e  Cn,n ,  and  define  the  matrix 


Us  =  elsA,  for  s  e  R. 


From  the  properties  of  the  exponential  of  a  matrix,  it  is  easy  to  show  that,  for  any 
real  s,  V,  the  following  identities  hold. 
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(i)  (US)WS  =  In, 

that  is  Us  is  unitary, 

(ii)  U0  —  In  ? 

(m)  (usy  =  u-„ 

(iv)  US+S'  =  USUS'  =  US'US, 

thus  in  particular,  these  unitary  matrices  commute  for  different  values  of  the 
parameter. 

The  map  R  — >  U (n)  given  by  s  i->  Us  is,  according  to  the  definition  in  the 
Appendix  A.4,  a  group  homomorphism  between  (R,  +)  and  U (n)  (with  group  mul¬ 
tiplication),  that  is  between  the  abelian  group  R  with  respect  to  the  sum  and  the  non 
abelian  group  U(n)  with  respect  to  the  matrix  product.  This  leads  to  the  following 
definition. 


Definition  12.3.6  If  Us  is  a  family  (labelled  by  a  real  parameter  s)  of  elements  in 
U (n)  such  that,  for  any  value  of  s  e  R,  the  above  identities  ii)  —  iv)  are  fulfilled, 
then  Us  is  called  a  one  parameter  group  of  unitary  matrices  of  order  n. 

For  any  self-adjoint  matrix  A,  we  have  a  one  parameter  group  of  unitary  matrices 
given  by  Us  =  elsA.  The  matrix  A  is  usually  called  the  infinitesimal  generator  of  the 
one  parameter  group. 

Proposition  12.3.7  For  any  A  =  e  Cn,n,  the  elements  Us  =  elsA  give  a  one 
parameter  group  of  unitary  matrices  in  Hn.  Conversely,  if  U s  is  a  one  parameter 
group  of  unitary  matrices  in  Hn,  there  exists  a  self-adjoint  matrix  A  =  A 1  such  that 
Us  =  eisA. 

Proof  Let  Us  G  U(/i)  be  a  one  parameter  group  of  unitary  matrices.  For  each  value 
s  e  R  the  matrix  Us  can  be  diagonalised,  and  since  Us  commutes  with  any  USf,  it 
follows  that  there  exists  an  orthonormal  basis  J3  for  Hn  of  common  eigenvectors  for 
any  Us.  So  there  is  a  unitary  matrix  V  (providing  the  change  of  basis  from  B  to  the 
canonical  base  S)  such  that 

Us  =  y{diag(ei^lW’-’iv”(s))}yt 


where  e'p,:{s)  are  the  eigenvalues  of  Us.  From  the  condition  Us Us>  =  US+S’  it  follows 
that  the  dependence  of  the  eigenvalues  on  the  parameter  s  is  linear,  and  from  Uo  =  In 
we  know  that  cpk(s  =  0)  =  0.  We  can  eventually  write 


Us  =  V  {diag(eim . is^)}Vt  =  Vels8V^ 


is  V(5Vf 


where  5  =  diag((/?i, . . . ,  cpn)  is  a  self-adjoint  matrix.  We  then  set  A  =  VSVf  =  Af 
to  be  the  infinitesimal  generator  of  the  given  one  parameter  group  of  unitary  matrices. 

□ 
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Quadratic  Forms 
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13.1  Quadratic  Forms  on  Real  Vector  Spaces 

In  Sect.  3.1  the  notion  of  scalar  product  on  a  finite  dimensional  real  vector  space  has 
been  introducedas  a  bilinear  symmetric  map  •  :  V  x  V  ->  R  with  additional  prop¬ 
erties.  Such  additional  properties  are  that  v  •  v  >  0  for  v  e  V,  with 
v  -  v  =  0  v  =  0 y.  This  is  referred  to  as  positive  definiteness. 

We  start  by  introducing  the  more  general  notion  of  quadratic  form. 

Definition  13.1.1  Let  V  be  a  finite  dimensional  real  vector  space.  A  quadratic  form 
on  V  is  a  map 

Q  :  V  x  V  — >  R  ( v ,  w )  i->  Q(v,  w ) 

that  fulfils  the  following  properties.  For  any  v,  w,  v\ ,  V2  G  V  and  oi ,  <22  e  R  it  holds 
that: 

(Ql)  Q(v,  w)  =  Q(w,  v), 

(Q2)  Q((aivi  +a2v2),  w)  =  Q(vi,  +a2Q(n2,  iu). 

When  a  quadratic  form  is  positive  definite,  that  is  for  any  v  e  V  the  additional 
conditions 

(El)  Q(v,v)>  0; 

(E2)  Q(v,  v)  =  0  v  =  0 y . 

are  satisfied,  then  Q  is  a  scalar  product,  and  we  say  that  V  is  an  euclidean  space. 

With  respect  to  a  basis  B  =  (u\, . . . ,  un)  for  V,  the  conditions  Ql  and  Q2  are 
clearly  satisfied  if  and  only  if  there  exists  a  symmetric  matrix  F  =  ( Fat, )  e  such 
that 

n 

Q(v,  w )  =  Q((t>i, . . . ,  r„)B,  (wi, . . . ,  w„)b)  =  ^  Fab  v awb. 

a.b=  1 
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This  expression  can  be  also  written  as 


Q(v,  w )  =  (i>i  •  •  •  v„) 


Not  surprisingly,  the  matrix  representing  the  action  of  the  quadratic  form  Q  depends 
on  the  basis  considered  in  V.  Under  a  change  of  basis  B  — >  B'withB'  =  (u[,  ... ,  u'n) 
and  corresponding  matrix  M6  as  we  know,  the  components  of  the  vectors  v,  w 
are  transformed  as 


and  analogously  for  w.  So  we  write  the  action  of  the  quadratic  form  Q  as 


Q(v,  w)  =  (v[  ■■■  v'n)  (‘Mb-b  F  Mb’-b) 


/uA 


\WJ 


If  we  write  the  dependence  on  the  basis  as  Q  — >  FB ,  we  have  then  shown  the 
following  result. 

Proposition  13.1.2  Given  a  quadratic  form  Q  on  the  finite  dimensional  real  vector 
space  V,  with  FB  and  F 6  the  matrices  representing  Q  on  V  with  respect  to  the 
bases  B  and  B',  it  holds  that 


fB  = 

Corollary  13.1.3  Since  the  matrix  F 6  associated  with  the  quadratic  form  Q  on  V 
for  the  basis  B  is  symmetric,  it  is  evident  from  the  Proposition  4. 1.20  that  the  matrix 
F13  associated  with  Q  with  respect  to  any  other  basis  B'  is  symmetric  as  well. 

The  Proposition  13.1.2  is  the  counterpart  of  the  Proposition 7.9.9  which  related 
the  matrices  of  a  linear  maps  in  different  bases.  This  transformation  is  not  the  same  as 
the  one  for  the  matrix  of  an  endomorphism  as  described  at  the  beginning  of  Chap.  9. 
To  parallel  the  definition  there,  one  is  led  to  the  following  definition. 

Definition  13.1.4  The  symmetric  matrices  A,  B  e  Wl,n  are  called  quadratically 
equivalent  (or  simply  equivalent)  if  there  exists  a  matrix  P  e  GL(ft),  such  that 
B  =  PAP.  Analogously,  the  quadratic  forms  Q  and  Q!  defined  on  a  real  finite  dimen¬ 
sional  vector  space  V  are  called  equivalent  if  their  representing  matrices  are  (quadrat¬ 
ically)  equivalent. 
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Exercise  13.1.5  Let  us  consider  the  symmetric  matrices 


They  are  not  similar,  since  for  example  det(A)  =  2  ^  det(Z?)  =  3  (recall  that  if  two 
matrices  are  similar,  then  their  determinants  must  coincide,  from  the  Binet  Theo¬ 
rem  5. 1.1 6).  They  are  indeed  quadratically  equivalent :  the  matrix 


gives  PAP  =  B. 

In  parallel  with  the  Remark 9. 1.4  concerning  similarity  of  matrices,  it  is  easy  to 
show  that  the  quadratic  equivalence  is  an  equivalence  relation  within  the  collection 
of  symmetric  matrices  in  M",n.  It  is  then  natural  to  look  for  a  canonical  representative 
in  any  equivalence  class. 

Proposition  13.1.6  Any  quadratic  form  Q  is  equivalent  to  a  diagonal  quadratic 
form,  that  is  one  whose  representing  matrix  is  diagonal. 

Proof  This  is  just  a  consequence  of  the  fact  that  symmetric  matrices  are  orthogonally 
diagonalisable.  From  the  Proposition  10.5.1  we  know  that  for  any  symmetric  matrix 
A  e  M77,n  there  exists  a  matrix  P  which  is  orthogonal,  that  is  P~l  =  P,  such  that 

TPAP  =  Aa 


where  Aa  is  a  diagonal  matrix  whose  entries  are  the  eigenvalues  of  A.  □ 

Without  any  further  requirements  on  the  quadratic  form,  the  matrix  Aa  may  have 
a  number  /x  of  positive  eigenvalues,  a  number  v  of  negative  eigenvalues,  and  also 
the  zero  eigenvalue  with  multiplicity  mo  =  m^= o-  We  can  order  the  eigenvalues  as 
follows 

A^  =  diag  (XPl ,  •  •  •  ,  kPfl ,  Xni ,  •  •  •  ,  XUv ,  0,  •  •  •  ,0) 


As  in  the  Exercise  13.1.5,  we  know  that  the  diagonal  matrix 


Q  =  diag  ( 


tQ^aQ  =  diag  (1, . . . ,  1,  -1, . . . ,  -1,  0,  . . . ,  0)  =  VA 


is  such  that 
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with  the  expected  multiplicities  /x  for  +1,  v  for  —  1  and  mo  for  0.  Since  we  are 
considering  only  transformations  between  real  basis,  these  multiplicities  are  constant 
in  each  equivalence  class  of  symmetric  matrices. 

For  quadratic  forms,  this  means  that  any  quadratic  form  Q  on  V  is  equivalent 
to  a  diagonal  one  whose  diagonal  matrix  has  a  number  of  /x  times  +1,  a  number 
of  v  times  —1  and  a  number  of  mo  =  dim(V)  —  /x  —  v  times  0.  The  multiplicities 
/x  and  v  depend  only  on  the  equivalence  class.  Equivalently,  for  a  quadratic  form 
Q  on  V,  there  is  a  basis  for  V  with  respect  to  which  the  matrix  representing  Q  is 
diagonal,  with  diagonal  entries  given  +1  repeated  /x  times,  —1  repeated  v  times  and 
mo  multiplicity  of  0. 

Definition  13.1.7  Given  a  symmetric  matrix  A  on  ,  we  call  V A  its  canonical  form 
(or  reduced  form).  If  Q  is  a  quadratic  form  on  Rn  whose  matrix  F 8  is  canonical, 
then  one  has 


Q(v,  w)  =  vPlwPl  H - F  vp^wp^  ~  (Vniwni  H - F  vnvwnv) 

with  v  =  ( vPl ,  . . . ,  vp  ,  vni,  . . . ,  vHv ,  v\ ,  . . . ,  vmo)  and  analogously  for  w.  This  is  the 
canonical  form  for  the  quadratic  form  Q.  The  triple  sign(Q)  =  (/x,  v,  mo)  is  called  the 
signature  of  the  quadratic  form  Q.  In  particular,  the  quadratic  form  Q  is  called  positive 
definite  if  sign(Q)  =  (/x  =  n,  0,  0),  and  negative  definite  if  sign(Q)  =  (0,  v  =  n,  0). 

Exercise  13.1.8  On  V  =  M3  consider  the  quadratic  form 


Q(v,  w)  =  vi w2  +  v2w i  +  vi w3  +  v3w i  +  v2w3  +  v3w2 


where  v  =  (v\,  v2,  v2 )&  and  w  =  (w i,  w2 ,  w2 )&  with  respect  to  a  given  basis 
(u\,  u2 ,  u2).  Its  action  is  represented  by  the  matrix 


F 


B 


To  diagonalise  it,  we  compute  its  eigenvalues  from  the  characteristic  polynomial, 

Pfb(T)  =  - T 3  +  3T  +  2  =  (2  -  T)(  1  +  T)2. 


The  eigenvalue  A  =  2  is  simple,  with  eigenspace  Vx=2  =  1,  1,  1)),  while 

the  eigenvalue  A  =  —  1  has  multiplicity  m^=_i  =  2,  with  corresponding  eigenspace 
I4=-i  =  £((  1,  —1,  0),  (1,  1,  —2)).  If  we  define 


we  see  that 


p  _  —  _L 

r  ~  ~  V6 


y/2  V3  1 
V2-V3  1 
V2  0  -2 
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(2  0  0  \ 

rPFBP  =  0  -1  0  =  Aa  =  Fb' 

Vo  0  -l) 

with  respect  to  the  basis  Br  =  {u\ ,  u2,  u'3)  of  eigenvectors  given  explicitly  by 

u\  =  4|(mi  +  u2  +  m3), 
u'2  =  ^(m  -  M2), 

«3  =  4g(Mi  +  M2  —  2m3). 

With  respect  to  the  basis  B'  the  quadratic  form  is  written  as 

Q(v,  w )  =  2^1/))  —  (v'2w'2  +  v'3w'3). 


Motivated  by  the  Exercise  13.1.5,  with  the  matrix 


Q  = 


we  have  that 

/!  0  0\ 

'QFB  Q  =  0-1  0  I  =  Fb 

to  0  -1 ) 

on  the  basis  B"  =  (ur[  =  u2  =  u2,u2  =  u'3).  With  respect  to  B"  the  quadratic 
form  is 

Q(v,  w)  =  —  v2w2  —  V3W3, 

in  terms  of  the  components  of  v,  w  in  the  basis  B".  Its  signature  is  sign  (Q)  =  (1,  2,  0). 

Exercise  13.1.9  On  the  vector  space  M4  with  canonical  basis  £,  consider  the 
quadratic  form 


Q(V,  W)  =  U\W\  +  U2W2  +  U\W2  +  U2W1  +  U3W4  +  U4W3  —  U^VOt,  —  U4W4, 


for  any  two  vectors  v,  w  in  M4.  Its  representing  matrix  is 


(\  1  0  0  \ 
110  0 
0  0-11  ’ 
\0  0  1  -1/ 


which  has  been  already  studied  in  the  Exercise  10.5.3.  We  can  then  immediately  write 
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/0  0  0  0\ 
0  0  0  0 
0  0-20 
\00  0  2/ 

with  the  basis  S'  =  {e\ ,  £3,  £4)  given  by 

e'l  =  71  Oi  -  e2), 

e2  =  72^3  +  e\), 
e3  —  71  (el  +  e2 )» 

^4  =  7 2^3  ^4)* 


P  = 


72 


/  1  0  1  0  \ 
-10  10 
0  10  1 
\  0  10-1/ 


1 pf £  p 


With  respect  to  the  basis  S"  =  (e'(  =  e\ ,  ^  —  e2^e3  ~  75^3*  ^4  =  7^4)  ^ 

clear  that  the  matrix  representing  the  action  of  Q  is  F  =  diag(0,  0,  —  1 ,  1),  so  that 
the  canonical  form  of  the  quadratic  form  Q  reads 


Q(v,w)  =  —v^w's  +  V4W4 

withn  =  (1//,  v'f  v'fi  and  analogously  for  u;.  It  signature  is  sign (Q)  =  (1,  1,  2) 

Remark  13. 1.10  Once  the  dimension  n  of  the  real  vector  space  V  is  fixed,  the  collec¬ 
tion  of  inequivalent  quadratic  forms,  that  is  the  quotient  of  the  symmetric  matrices  by 
the  quadratic  equivalence  relation  of  the  Definition  13.1.7,  is  labelled  by  the  possible 
signatures  of  the  quadratic  forms,  or  equivalently  by  the  signatures  of  the  symmetric 
matrices,  written  as  sign(Q)  =  (/x,  v,  n  —  /i  —  v). 

Finally,  we  state  the  conditions  for  a  quadratic  form  to  provides  a  scalar  product 
for  a  finite  dimensional  real  vector  space  V.  Since  we  have  discussed  the  topics  at 
length,  we  omit  the  proof  of  the  following  proposition. 

Proposition  13.1.11  A  quadratic  form  Q  on  a  finite  dimensional  real  vector  space 
V  provides  a  scalar  product  if  and  only  if  it  is  positive  definite.  In  such  a  case  we 
denote  the  scalar  product  by 

v  •  w  =  Q(v,  w). 


Exercise  13.1.12  With  respect  to  the  canonical  basis  £  on  M2  we  consider  the 
quadratic  form 


Q(v,  w)  =  av\W\  +  v\w2  +  t ’2^1,  with  agM, 

for  v  =  (iq,  v2)  and  w  =  (uq,  uq).  The  matrix  representing  Q  is  given  by 
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and  its  characteristic  polynomial,  pFs  ( T )  =  T2  —  aT  —  1,  gives  eigenvalues 

k±  =  ^(a  zb  a1  +  4). 

Since  for  any  real  value  of  a  there  is  one  positive  eigenvalue  and  one  negative 
eigenvalue,  we  conclude  that  the  signature  of  the  quadratic  form  is 
sign(Q)  =  (1,  1,0). 

Exercise  13.1.13  Consider,  from  the  Exercise  11.1.11,  the  three  dimensional  vector 
space  V  of  antisymmetric  matrices  in  M3,3.  If  we  set 

Q(L,L')  =  -±tr  (LL') 

with  L,  U  e  V,  it  is  immediate  to  verify  that  Q  is  a  quadratic  form.  Also,  the  basis 
elements  La  given  in  the  Exercise  11.1.11  are  orthonormal, 

QiLcnLb)  =  &ab- 

Then,  the  space  of  real  antisymmetric  3x3  matrices  is  an  euclidean  space  for 
this  scalar  product. 

Exercise  13.1.14  On  M2  again  with  the  canonical  basis,  we  consider  the  quadratic 
form 

Q(v,w)  ~  v\W\  +  V2W2  +  a(v\U)2  +  V2W1),  with  a  e  R, 


whose  representing  matrix  is 

Fs 


Its  characteristic  polynomial  is pFs  =  (1  —  T)2  —  a2  =  (1  —  T  —  a)  (l 
with  eigenvalues 


X±  =  1  it  a  . 


T  +  a ), 


We  have  the  following  cases: 

•  for  a  >  1,  it  is  sign(Q)  =  (1,  1,0); 

•  for  a  =  ±1,  it  is  sign(Q)  =  (1,  0,  1); 

•  for  a  <  — 1,  it  is  sign(Q)  =  (1,  1,  0); 

•  for  —1  <  a  <  1,  it  is  sign(Q)  =  (2,  0,  0). 

In  this  last  case,  the  quadratic  form  endows  M2  with  a  scalar  product.  The 
eigenspaces  are 


X_  =  (l-fl),  Vk_=C((  1,-1)), 

*+  =  (1  +0),  Vk+=£«  1,1)), 
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so  we  can  define  the  matrix 


M 


£',£  _  1 


1  1 


V2  \  -1  1 


which  gives 


tMs,sFsMs,s  = 


(\  -a  0  \ 

\  0  1  +  a)' 


With  respect  to  the  basis  £'  =  =  (e\  —  ei),  e2  =  +  £2))  the  quadratic 

form  is 

Q(v,  w)  =  (1  —  a)v\  w\  +  (1  +  a)v'2W2. 


We  obtain  the  canonical  form  for  Q  if  we  consider  the  basis  £ "  given  by 


e 


a 

1 


1 

\J  1 


The  basis  £"  is  orthonormal  with  respect  to  the  scalar  product  defined  by  Q. 

Exercise  13.1.15  This  exercise  puts  the  results  of  the  previous  one  in  a  more  general 
context. 

(a)  From  Exercise  13.1.14  we  know  that  the  symmetric  matrix 


with  a  e  M,  is  quadratically  equivalent  to  the  diagonal  matrix 

o'  _  (l  ~ a  0  \ 

*  \  0  1  -ha)  ' 

Let  us  consider  S  and  S'  as  matrices  in  C2,2  with  real  entries  (recall  that  R  is  a 
subfield  of  C).  We  can  then  write 


/(I  -a)~1/2  0  \{l~a  0  0  \  _  A  °\ 

V  0  (i  +  «)_1/2/ V  0  l  +  a)  V  0  (i  +  «)_1/2/  \o  1) 


for  any  a  e  R.  This  means  that,  by  complexifying  the  entries  of  the  real  sym¬ 
metric  matrix  S ,  there  exists  a  transformation 

S  ^  fPSP  =  h 

with  P  e  GL (n,  C)  (the  group  of  invertible  n  x  n  complex  matrices),  which 
transforms  S  to  I2. 
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(b)  From  the  Exercise  13.1.8  we  know  that  the  symmetric  matrix 


is  quadratically  equivalent  to 


/I  0  0 

S'  =  0-1  0 

\0  0  -1 

By  again  considering  them  as  complex  matrices,  we  can  write 

/l  0  0\  /l  0  0  \  /I  0  0\ 

h  =  0  i  0  (0—1  0  |  0  i  0  I  . 

\0  0  i)  \0  0  -1/  \0  0  i / 

Thus,  S  is  quadratically  equivalent  to  f  via  an  invertible  matrix  P  e  Cn,n. 

If  A  is  a  symmetric  matrix  with  real  entries,  from  the  Proposition  13.1.6  we  know 
that  it  is  quadratically  equivalent  to 


Aa  —  diag  (Xpi ,  •  •  •  ,  XPfi ,  Xni ,  •  •  •  ,  XHv ,  0,  •  •  •  ,  0), 


with  XPj  >  0  and  Xn.  <  0.  Given  the  invertible  matrix 


P  =  diag  ( 


in  C”,n,  one  finds  that 


rPAAP 


diag  (1, 


where  the  number  of  non  zero  terms  +1  is  given  by  the  rank  of  A. 

If  we  now  define  that  two  symmetric  matrices  A,  B  e  Cn,n  are  quadratically  equiv¬ 
alent  if  there  exists  a  matrix  P  e  GL  (n,  C)  such  that  B  =  *  PAP,  we  can  conclude  that 
any  real  symmetric  matrix  A  is  quadratically  equivalent  to  a  diagonal  matrix  VA  as 
above. 

The  diagonal  matrix  VA  above  gives  a  canonical  form  for  A  with  respect  to 
quadratic  equivalence  after  complexification.  Notice  that,  since  ( iIn)A(iIn )  =  —A, 
we  have  that  A  is  quadratically  equivalent  to  —A.  This  means  that  a  notion  of 
complex  signature  does  not  carry  much  information  since  it  cannot  measure  the 
signs  of  the  eingenvalues  of  A,  but  only  its  rank.  If  A  =  A  =  A,  then  we  set 
sign(A)  =  (rk(A),  dimker(A)). 
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We  conclude  by  observing  that  what  we  have  sketched  above  gives  the  main 
properties  of  a  real  quadratic  form  on  a  complex  finite  dimensional  vector  space, 
whose  definition  is  as  follows. 

Definition  13.1.16  A  real  quadratic  forms  on  a  complex  finite  dimensional  vector 
spaces  is  a  map 


S  :  Cn  x  Cn  — >  C,  ( v ,  w)  i->  S(v,  w) 

such  that,  for  any  v,  w,  v i,  gC'1  and  a\,  <22  £  C  it  holds  that: 

(51)  S( v,  w)  =  S(w ,  v ), 

(52)  tS(n,  w)  e  R  if  and  only  if  n  =  F  and  w  =  w, 

(53)  <S((fliUi  +  <22^2),  w)  =  a\S(v\,  w)  +  a2S(v2,  w). 

It  is  clear  that  S  is  a  real  quadratic  form  on  Cn  if  and  only  if  there  exists  a  real 
basis  B  for  C",  that  is  a  basis  which  is  invariant  under  complex  conjugation,  with 
respect  to  which  the  matrix  S6  e  Cn,n  representing  S  is  symmetric  with  real  entries. 

In  order  to  have  a  more  elaborate  notion  of  signature  for  a  bilinear  form  on  complex 
vector  spaces,  one  needs  the  notion  of  hermitian  form  as  explained  in  the  next  section. 


13.2  Quadratic  Forms  on  Complex  Vector  Spaces 

It  is  straightforward  to  generalise  to  Cn  the  main  results  of  the  theory  of  quadratic 
forms  on  W.  The  following  definition  comes  naturally  after  Sects.  3.4  and  8.2. 

Definition  13.2.1  Let  V  be  a  finite  dimensional  complex  vector  space.  A  hermitian 
form  on  V  is  a  map 


H  :  V  x  V  — >  C,  ( v ,  w)  1-^  H(v ,  w) 

that  fulfils  the  following  properties.  For  any  v,  w,  v  1 ,  r>2  G  V  and  a\,  02  e  C  it  holds 
that: 

(HI)  H(v,  w)  =  H(w,  v ), 

(H2)  +  a2v 2),  w)  =  w)  +  a2Q(v2,  w). 

When  a  hermitian  form  is  positive  definite,  that  is  for  any  v  e  V  the  additional 
conditions 

(El)  H(v,  v)  >  0; 

(E2)  H(v,  v)  =  0  v  =  0y. 


are  satisfied,  then  7i  is  a  hermitian  product,  and  we  say  that  V  is  a  hermitian  space. 
We  list  the  properties  of  hermitian  forms  in  parallel  with  those  of  the  real  case. 
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(a)  With  respect  to  any  given  basis  B  =  (u\,  . . . ,  un)  of  V,  the  conditions  HI  and 
H2  are  satisfied  if  and  only  if  there  exists  a  selfadjoint  matrix//  =  (Hab)  e  Cn,n , 
H  =  H\  such  that 

n 

H(v,w)  =  V  Habvawb. 

a,b—  1 

If  we  denote  by  HB  the  dependence  on  the  basis  of  V  for  the  matrix  giving  the 
action  of  H,  under  a  change  of  bases  B  — >  B'  we  have 

HB'  =  (M* H*  M* *  =  (H*)\  (13.1) 

(b)  Two  selfadjoint  matrices  A,  B  e  Cn,n  are  defined  to  be  equivalent  if  there  exists 
an  invertible  matrix  P  such  that  B  =  P^AP.  This  is  an  equivalence  relation  within 
the  set  of  selfadjoint  matrices.  Analogously,  two  hermitian  forms  H  and  H ’  on 
Cn  are  defined  to  be  equivalent  if  their  representing  matrices  are  equivalent. 

(c)  From  the  spectral  theory  for  selfadjoint  matrices  it  is  clear  that  any  hermitian 
form  H  is  equivalent  to  a  hermitian  form  whose  representing  matrix  is  diagonal. 
Referring  to  the  relation  (13.1),  there  exists  a  unitary  matrix  U  =  MB  ,B  of  the 
change  of  basis  from  B  to  Bf  such  that  HB  =  diag(Ai, . . . ,  Xn),  with  ^  cM 
giving  the  spectrum  of  HB. 

(d)  The  matrix  HB  is  further  reduced  to  its  canonical  form  via  the  same  conjugation 
operation  described  for  the  real  case  after  the  Proposition  13.1.6. 

Since,  as  in  real  case,  no  conjugation  as  in  (13.1)  can  alter  the  signs  of  the  eigen¬ 
values  of  a  given  selfadjoint  matrix,  the  notion  of  signature  is  meaningful  for 
hermitian  forms.  Such  a  signature  characterises  equivalence  classes  of  selfad¬ 
joint  matrices  (and  then  of  hermitian  forms)  via  the  equivalence  relation  we  are 
considering. 

(e)  A  hermitian  form  Pi  equips  Cn  with  a  hermitian  product  if  and  only  if  it  is  positive 
definite. 

Exercise  13.2.2  On  C2  we  consider  the  basis  B  =  (u\,  U2)  and  the  hermitian  form 


1-[(v,w)  =  a(v\W\  +  V2W2)  +  ib(v\W2  —  V2W\),  with  a,  b  e  R 

for  v  =  (v\,  V2 )b  and  w  =  (uq,  uq)#.  The  hermitian  form  is  represented  by  the 
matrix 


hb  = 


a  i  b 
— i  b  a 


=  (HD) 


The  spectral  resolution  of  this  matrix  gives 


X±  =  a  ±  b,  Vx±  =  C(u±) 


with  normalised  eigenvectors 
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u±  =  (±i,  \)b, 

and  with  respect  to  the  basis  B'  =  (b\  =  u+,  b'2  =  u_)  one  finds 


H 


B ' 


fa  +  b  0\ 
\a  -  b  OJ  • 


We  reduce  the  hermitian  form  H  to  its  canonical  form  by  defining  a  basis 


so  to  have 


B"  =  ( 


y/\a+b\ 


b'v 


1 

Vl«— ■ b\ 


(a+b 
\a+b\ 

0 


We  see  that  the  signature  of  H  depends  on  the  relative  moduli  of  a  and  b.  It  endows 
C2  with  a  hermitian  product  if  and  only  if  \a\  >  \b\,  with  B"  giving  an  orthonormal 
basis  for  it. 


13.3  The  Minkowski  Spacetime 

We  now  describe  the  quadratic  form  used  for  a  geometrical  description  of  the  elec¬ 
tromagnetism  and  for  the  special  theory  of  relativity. 

Let  V  be  a  four  dimensional  real  vector  space  equipped  with  a  quadratic  form  Q 
with  signature  sign(Q)  =  (3,  1,0).  From  the  theory  we  have  developed  in  Sect.  13.1 
there  exists  a  (canonical)  basis  £  =  (eo,  e\,  e2,  e$)  with  respect  to  which  the  action 
of  Q  is  given  by1 


Q( v,  w)  =  -vqivo  +  V\ W\  +  v2w 2  +  V3W3 


with  v  —  (vo,  v\,  V2,  V3)  and  w  =  (wo,  w\ ,  w 3). 

Definition  13.3.1  The  equivalence  class  of  quadratic  forms  on  M4  characterised  by 
the  signature  (3,  1,0)  is  said  to  provide  M4  a  Minkowski  quadratic  form,  that  we 
denote  by  rj.  The  datum  (M4,  rj)  is  called  the  Minkowski  spacetime,  using  the  name 
from  physics.  We  shall  denote  it  by  M 4  and  with  a  slight  abuse  of  terminology,  we 
shall  also  denote  the  action  of  rj  as  a  scalar  product 


V  •  w  =  ij(v,  w) 

and  refer  to  it  as  the  (Minkowski)  scalar  product  in  M4. 


1  The  reason  why  we  denote  the  first  element  by  eo  and  the  corresponding  component  of  a  vector  v 

by  vq  comes  from  physics,  since  such  components  is  identified  with  the  time  coordinate  of  an  event. 
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Definition  13.3.2  We  list  the  natural  generalisations  to  M  4  of  well  known  definitions 
in  En. 

(a)  For  any  v  g  (M4,  rj),  the  quantity  ||  v  ||2  =  v  •  v  is  the  square  of  the  (Minkowski) 
norm  of  r  G  M4; 

the  vector  v  is  called  space-like  if  ||u||  2  >0, 
the  vector  v  is  called  light-like  if  ||n||  =  0, 
the  vector  v  is  called  time-like  if  ||u||  2  <0. 

Two  vectors  v,  w  g  M4  are  orthogonal  if  v  •  w  =  0;  thus  a  light-like  vector  is 
orthogonal  to  itself. 

A  basis  B  for  M4  is  orthonormal  if  the  action  of  rj  with  respect  to  B  is  diagonal, 
that  is  if  and  only  if  the  matrix  vf3  has  the  form 

-1  0  0  0\ 

0  10  0 
0  0  10  • 

\  0  0  0  1  / 

We  simply  denote  r ]jlv  =  with  B  orthonormal. 

(d)  A  matrix  A  g  M4,4  is  a  Lorentz  matrix  if  its  columns  yield  an  orthonormal  basis 
for  M4. 

We  omit  the  proof  of  the  following  results,  which  generalise  to  M 4  analogous 
results  valid  in  En . 

Proposition  13.3.3  Let  B  =  (eo,  e\,  e2,  ef)  be  an  orthonormal  basis  for  M4,  with 
A  g  M4,4  and  <fi  g  End(M4). 

(a)  The  matrix  A  is  a  Lorentz  matrix  if  and  only  ifArjA  =  r). 

(b)  It  holds  that  (p(v)  •  <fi(w)  =  v  •  w  for  any  v,  w  G  M 4  if  and  only  if  is  a 

Lorentz  matrix. 

(c)  The  system  B'  =  (0(^o)»  •  •  • ,  is  an  orthonormal  basis  for  M4  if  and  only 

if  for  any  v,  w  g  M4  one  has  =  v  •  w,  that  is  if  and  only  if 

0(^/x)  ’  0(^v)  ~  €fl  ’  ~  tf/AV 

As  an  immediate  consequence  of  such  proposition,  one  proves  that,  if  u  G  M4  is  a 
space-like  vector,  there  exists  an  orthonormal  basis  B'  for  M 4  with  respect  to  which 
u  =  (0,  u[,  u'2,  urf)B'-  Analogously,  if  u  is  a  time-like  vector,  there  exists  a  basis  B" 
with  respect  to  which  u  =  (mq,  0,  0,  0)#". 

Indeed  it  is  straightforward  to  prove  that  the  set  of  Lorentz  matrices  form  a  group, 
for  matrix  multiplication,  denoted  by  0(3,  1)  and  called  the  Lorentz  group.  If  the 
endomorphism  0  is  represented,  with  respect  to  an  orthonormal  basis  for  M4  by  a 


(b) 

(c) 
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Lorentz  matrix,  then  0  is  said  to  be  a  Lorentz  transformation.  This  means  that  the 
set  of  Lorentz  transformations  is  a  group  isomorphic  to  the  Lorentz  group. 

Example  13.3.4  In  the  special  theory  of  relativity  the  position  of  a  point  mass  at  a 
given  time  t  is  represented  with  a  vector  v  =  (vo  =  ct,  x\ ,  X2,  x^)]s  in  M4  with  respect 
to  an  orthonormal  basis  B ,  with  (x\,  X2,  xf)  giving  the  so  called  spatial  components 
of  v  and  c  denoting  the  speed  of  light.  Such  a  vector  v  is  also  called  an  event.  The 
linear  map 


(A 

Y 

-Py 

o 

o 

A 

-Py 

Y 

0  0 

Xl 

A 

0 

0 

1  0 

*2 

\A) 

l 

0 

0 

0  V 

w 

with 

P  =  v/c  and  y  =  (1  —  /32)-1/2, 

yields  the  components  of  the  vector  v  with  respect  to  an  orthonormal  basis  B'  corre¬ 
sponding  to  an  inertial  reference  system ,  (an  inertial  observer)  which  is  moving  with 
constant  spatial  velocity  v  along  the  direction  e\.  Notice  that,  being  c  a  limit  value 
for  the  velocity,  we  have  \/3\  <  1  and  then  y  >  1.  It  is  easy  to  see  that  this  map  is  a 
Lorentz  transformation,  and  that  the  matrix  gives  the  change  of  basis  M8  ,8  in  M4. 

From  the  identity  At]  A  =  ij  one  gets  detA  =  ±1  for  a  Lorentz  matrix  A.  The 
set  of  Lorentz  matrices  whose  determinant  is  positive  is  the  (sub)group  SO(3,  1)  of 
proper  Lorentz  matrices. 

If  AMy  denotes  the  entries  of  a  Lorentz  matrix  A,  then  from  the  same  identity  we 
can  write  that 


3 

— ^oo  +  A to  =  —  1  and 

k= 1 


3 


~A00  + 


k=  1 


thus  proving  that  Aq0  >  1.  Lorentz  matrices  with  Aoo  >  1  are  called  ortochronous. 
We  omit  the  proof  that  the  set  of  ortochronous  Lorentz  matrices  form  a  group  as  well. 
Proper  and  ortochronous  Lorentz  matrices  form  therefore  a  group,  that  we  denote 
by 

SO(3,  l)f  =  {A  g  0(3,  1)  :  detA  =  1,  A00  >  1}. 


Notice  that  the  Lorentz  matrix  given  in  Example  13.3.4  is  proper  and  ortochronous. 
Given  the  physical  interpretation  of  the  components  of  a  vector  in  M4  mentioned 
before,  it  is  natural  to  call  the  endomorphisms  represented  by  the  Lorentz  matrices 


/I  0  0  0  \ 

0-100 
0  0-10 
\0  0  0  -1/ 


/-I  0  0  0\ 
0  10  0 
0  0  10 
\  0  0  0  1/ 


p  = 


13.3  The  Minkowski  Spacetime 


227 


the  ( spatial) parity  and  the  time  reversal .  The  matrix  P  is  improper  and  ortochronous, 
while  T  is  improper  and  antichronous. 

We  can  generalise  the  final  remark  from  Example  1 1 .3. 1  to  the  Lorentz  group  case. 
If  A  is  an  improper  ortochronous  Lorentz  matrix,  then  it  is  given  by  the  product  PA' 
with  A'  6  SO  (3,  1)^ .  If  A  is  an  improper  antichronous  Lorentz  matrix,  then  it  is  given 
by  the  product  TA'  with  A'  e  SO(3,  1)1.  If  A  is  the  product  PTA'  with  A'  e  SO(3,  1)1, 
it  is  called  a  proper  antichronous  Lorentz  matrix. 

Let  us  describe  the  structure  of  the  group  SO(3,  1)1  in  more  details.  Lirstly,  notice 
that  if  R  e  SO(3)  then  all  matrices  of  the  form 

/I  0  0  0\ 

a  _  0 

Ar  or 

\0  / 

are  elements  in  SO(3,  1)1.  The  set  of  such  matrices  A  is  clearly  isomorphic  to  the 
group  SO (3),  so  we  can  refer  to  SO (3)  as  the  subgroup  of  spatial  rotations  within 
the  Lorentz  group. 

The  Lorentz  matrix  in  the  Example  13.3.4  is  not  such  a  rotation.  Lrom  the  Exer¬ 
cise  11.2.3  we  write 


e 


uS  i 


^  y  Py  0  0\ 
Py  y  0  0 
0  0  10 
\  0  0  0  1/ 


with  S\ 


/0  1  0  0\ 
10  0  0 
0  0  0  0 
\0  0  0  0/ 


with  sinh  u  =  /3y  and  cosh  u  =  y  so  that  tgh  u  =  v/c. 

We  therefore  have  a  closer  look  at  the  exponential  of  symmetric  matrices  of  the 
form 

^0  U\  U2  Ut\ 


S(u ) 


0  0  0 
U2  0  0  0 


—  u\S\  +  U2S2  +  U2S2, 


(13.2) 


y^3  0  0  0  / 


with  u  =  (mi,  U2,  u^)  a  triple  of  real  parameters.  If  the  matrix  R  =  (Rtj)  represents  a 
spatial  rotation,  a  direct  computation  shows  that 


Ar-  1  S  Ar 


(  0 

^2k= 1  RklUk 

J2k=l  Rk2Uk 
Rk3Uk 


Y^k=\Rk\Uk  Ylk=lRk2Uk  J2k=lRk3Uk^ 

0  0  0 

0  0  0 

0  0  0  / 


We  see  that  u  =  (u\ ,  U2,  U3)  transforms  like  a  vector  in  a  three  dimensional  euclidean 
space,  and  therefore  we  write  the  identity  above  as 
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S(R~lu)  =  Ar-iS(u)Ar. 

This  identity  allows  us  to  write  (see  the  Proposition  11.2.2) 

es(u)  =  AR-ies(Ru)AR. 

If  R  is  a  proper  rotation  mapping  u  i— >•  (||k||e,  0,  0),  with  ||n|||  =  u\  +  u\  +  u\  the 
square  of  the  euclidean  three-norm,  we  get 

es(u)  =  AR-ie(l|u|l£5l)Aff. 


Alternatively,  one  shows  by  direct  computations  that 


||n|||  0  0  0\ 


S2(u)  = 


0 

0  Q 

V  o 


and  S3(u)  =  \\u\\RS(u) 


7 


S2k(u)  =  ||n||^_1)5(n),  52A+1(m)  =  \\u\\2Ek  S  (u) , 


where  Q  e  K3,3  has  entries  Qy  =  UjUj,  so  that  Q2  =  ||m|||  Q.  These  identities  give 
then 

es(u)  _  y  _j_  -1—  (cosh  ||m|||  —  1  )S2(u)  +  ^7^  sinh  \\u\\E  S(u). 


\u\ 


\u\\ E 


It  is  easy  to  show  that  es ^  e  SO(3,  1)^.  Such  transformations  are  called  Lorentz 
boosts ,  or  hyperbolic  rotations.  They  give  the  matrices  of  change  of  bases  M 15,13 
where  J3'  is  the  orthonormal  basis  corresponding  to  an  inertial  reference  system 
moving  with  constant  velocity  v  =  (iq ,  V2 ,  U3)  in  the  physical  euclidean  three  dimen¬ 
sional  space  with  respect  to  the  reference  system  represented  by  B ,  by  identifying 
for  the  velocity, 

c( tgh  \\u\\E)  =  ||u|| e- 


From  the  properties  of  the  group  SO(3)  we  know  that  each  proper  spatial  rotation 
is  the  exponential  of  a  suitable  antisymmetric  matrix,  that  is  AR  =  e  where  L  is  an 
element  in  the  three  dimensional  vector  space  spanned  by  the  matrices  Lj  C  M4,4  of 
the  form 

/o  0  0  0\ 

~  0 
y=  0  Lj 

Vo  7 

with  the  antisymmetric  matrices  Lj ,  j  =  1,  2,  3,  those  of  the  Exercise  11.1.10,  the 
generators  of  the  Lie  algebra  50 (3).  With  the  symmetric  matrices  Sj  in  (13.2),  we 
compute  the  commutators  to  be 


13.3  The  Minkowski  Spacetime 


229 


3 

\Li ,  Ef\  =  y  ^  SijfcLfc 
ij=  1 

3 

\$i>  ^/]  =  ^  '  &ijkLk 

i,j=  1 
3 

l>h  ?  ^7]  =  ^  '  ^ijk^k  5 
ij=l 

thus  proving  that  the  six  dimensional  vector  space  £(7^ ,  L2 ,  L3 ,  Si ,  S2 ,  S3 )  is  a  matrix 
Lie  algebra  (see  the  Definition  11.1.7)  which  is  denoted  so (3,  1).  What  we  have 
discussed  gives  the  proof  of  the  first  part  of  the  following  proposition,  which  is 
analogous  of  the  Proposition  1 1.2.6. 

Proposition  13.3.5  If  M  is  a  matrix  in  so (3,  1),  then  eM  e  SO (3,  1)^.  When 
restricted  to  so (3,  1),  the  exponential  map  is  surjective  onto  SO(3,  1)£ 

This  means  that  the  group  of  proper  and  ortochronous  Lorentz  matrices  is  given 
by  spatial  rotations,  hyperbolic  rotations  (that  is  boosts)  and  their  products. 


13.4  Electro-Magnetism 

By  recalling  the  framework  of  Sect.  1 .4,  in  the  standard  euclidean  formulation  on 
the  space  E 3  representing  the  physical  space  S  (and  with  an  orthonormal  basis)  one 
describes  the  three  dimensional  electric  E(t, x)  field  and  the  magnetic  field  B (Lx) 
as  depending  on  both  the  three  dimensional  position  vector  x  =  (x\ ,  X2,  xf)  and  the 
time  coordinate  t.  In  this  section  we  show  that  the  Maxwell  equations  for  electro¬ 
magnetism  can  be  naturally  formulated  in  terms  of  the  geometry  of  the  Minkowski 
space  M4. 

Example  13.4.1  The  Maxwell  equations  in  vacuum  for  the  pair  (E(f,  x),  B(t,  x))  are 
written  as 


divB  =  0,  rotE  +  =  0 

divE  =  rotB  =  /r0J  +  Mo«o  §7 

where  £0  and  no  are  the  vacuum  permittivity  and  permeability,  with  Ceo/iq  =  1.  The 
sources  of  the  fields  are  the  electric  charge  density  p  (a  scalar  field)  and  the  current 
density  J  (a  vector  field). 

For  the  homogeneous  Maxwell  equations  (the  first  two)  the  vector  fields  E  and  B 
can  be  written  in  terms  of  a  vector  potential  A  (Lx)  =  x),  A2(L  x),  A3(t,  x)) 

and  a  scalar  potential  0(L  x),  as 
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B  =  rot  A,  E  =  —  grad</>  —  |y, 

that  makes  the  homogeneous  equations  automatically  satisfied  from  the  identity 
div  (rot)  =  0  and  rot  (grad)  =  0,  in  Exercise  1.4.1.  If  the  potentials  satisfy  the  so 
called  Lorentz  gauge  condition 


divA+  iff  =0, 


the  two  Maxwell  equations  depending  on  the  sources  can  be  written  as 


V2  Aj  - 
V20  - 


1  tX 

c 2  3 12 

_j_  3 2</> 
c 2  3 12 


-Mo Jj,  for  j  =1,2,3, 

_  £. 

SO 


where  V2  =  Ylk= l  ^e  spatial  Laplacian  operator  with  9^  =  9/9v^. 

If  we  define  the  four-potential  as  A  =  (Ao  =  —  A),  then  the  Lorentz  gauge 

condition  is  written  as  (recall  the  Definition  13.3.2  for  the  metric  rj^) 


3 

^  '  7?/xv  ~  0, 

fi,v— 0 

where  we  also  define  do  =  d/dxo  =  d/cdt.  In  terms  of  the  four-current 
J  =  (J0  =  —p/cs o,  oJ),  the  inhomogeneous  Maxwell  equations  are  written  as 


3 

^  iivd fidvAp 

11, V 


Using  the  four- dimensional  ‘nabla’  operator  V  =  (9o,  9i,  92,  93)  we  can  then 
write  the  Lorentz  gauge  condition  as 


3 

¥  •  A  =  ^  rj^dpAv  =  0, 

li, v— 0 

and  the  inhomogeneous  Maxwell  equations  as 

3 

¥ 2AP  =  ^  rjftvdpd vAp  =  -Jp,  for  p  =  0,  1,2,  3, 

li, v— 0 

thus  generalising  to  the  Minkowski  spacetime  the  analogous  operations  written  for 
the  euclidean  space  E3  in  Sect.  1.4. 

Example  13.4.2  Lrom  the  relations  defining  the  vector  fields  E  and  B  in  terms  of  the 
four-potential  vector  A,  we  can  write  for  their  components  in  the  physical  space 
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3 

Bq  ~  ^  '  ^abc^b^c 
b,c=  1 

Ea  =  c(daA0  -  d0Aa) 
for  a  =  1,  2,  3.  This  shows  that  the  quantity 

E^v  =  d^Av  -  dvA^ 

with  /r,  v  e  {0,  . . . ,  3},  defines  the  entries  of  the  antisymmetric  field  strength  matrix 
(or  more  precisely  field  strength  ‘tensor’)  F  given  by 

E2/c  -E3/c\ 

B3  —Bi 

0  B\ 

-By  0  ) 

Merging  the  definition  of  F  with  the  Lorentz  gauge  condition  we  have 

3  3 

^  ^  h /IV  d/X  dyAp  =  ^  '  h  liv^ /i(E  vp  T  dpAy) 

/x,y  /x,y=0 

3  3 

—  ^  '  h p,v^p,EVp  +  dp(  ^  '  h piv^p,Av) 

/x,y=0  /x,y=0 

so  we  can  write  the  inhomogeneous  Maxwell  equations  as 

3 

E  rh*vdnFvp  =  ~jp  for  P  =  0,1,  2,  3. 

/x,y= 0 

The  homogeneous  Maxwell  equation  can  be  written  in  a  similar  way  by  means 
of  another  useful  quantity,  the  dual  field  strength  matrix  (or  tensor)  F ^ .  For  this 
one  needs  the  (four  dimensional)  totally  antisymmetric  symbol  £aia2a3aA  with  indices 
aj  =0,  1,  2,  3  and  defined  by 

1+1  if  (a\,  a2,  a3,  a4)  is  an  even  permutation  of  (0, 1,2,3) 

- 1  if  (ai ,  a2 ,  a3 ,  a4)  is  an  odd  permutation  of  (0, 1 ,2,3)  . 

0  if  any  two  indices  are  equal 

Also,  let  rj~l  =  (jj^)  be  the  inverse  of  the  matrix  r)  =  ( r\ MV).  The  dual  field 
strength  matrix  is  the  antisymmetric  matrix  defined  by 


—  ^  '  hpiv^iiE 

/x,y=0 


vp 


F  =  (/>)  = 


0 

Ei/c 
E2/c 
\E3/c 


-Ei/c 

0 

-B3 

b2 
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F  =  (/>y), 


3 

\  ^2  s^y8i1yari8PFap  = 

a,fi,y,8= 0 


/  0  Bi  B2  B3  \ 

-Bi  0  -E3/c  E2/c 

-B2  E3/c  0  -Ei/c 
y-B3  -E2/c  E\/c  0  y 


Notice  that  the  elements  of  F  are  obtained  from  those  of  F  by  the  exchange 

E  — cB. 

A  straightforward  computation  shows  that  the  homogeneous  Maxwell  equations 
can  be  written  as 


3 

E  rhiv^^Fvp  =  0,  for  p  =  0,1,2,  3. 

/jL,V=  0 

In  terms  of  Fpv  rather  then  F^v ,  these  homogeneous  equation  are  the  four  equations 

dpFiiv  +  d/iFvp  +  3  VF p/i  —  0 

for  /I,  v,  p  any  three  of  the  integers  0,  1,  2,  3. 

We  now  have  a  glimpse  at  the  geometric  nature  of  the  four-potential  A  and  of 
the  antisymmetric  matrix  F,  that  is  we  study  how  they  transform  under  a  change  of 
basis  from  8  to  8'  for  M4.  If  two  inertial  observers  (for  the  orthonormal  bases  8  and 
8'  for  M4)  relate  their  spacetime  components  as  in  the  Example  (13.3.4),  we  know 
from  physics  that  for  the  transformed  electric  and  magnetic  fields  E;  and  one  has 

E[=EU  B\  =  B\ 

E'2  =  Y(E2-vB3),  B'2  =  y(B2  +  (v/c2)E3) 

E’3  =  y(E3  +  vB2),  B’2  =  y(B3~  (v/c2)E2) 

For  the  transformed  potential  A'  =  (Af )  and  matrix  F'  =  (Fr  )  with 
F'^  =  3 fkA's  —  3 'sA'k  (where  3'  =  d/dx'a),  one  then  finds 

A'  =  MS'SA 
and 

F'  =  \MB’B')F  MB’B' . 

It  is  indeed  possible  to  check  that  such  identities  are  valid  for  any  proper  and 
ortochronous  Lorentz  matrix  giving  the  change  of  orthonormal  basis  8^8'. 

If  we  denote  by  M4*  the  space  dual  to  (M4,  rj)  with  {6o,  €\,  e2,  €3}  the  basis  dual 
to  8  =  (eo, . . . ,  £3),  the  definition 


ri(€a,€b)  =  rj(ea,eb) 
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clearly  defines  a  Minkowski  quadratic  form  on  M4*,  making  then  the  space  M4*. 
Also,  if  B  is  orthonormal,  then  B*  is  orthonormal  as  well. 

Recall  now  the  results  described  in  Sect.  8.1  on  the  dual  of  a  vector  space.  The 
previous  relations,  when  compared  with  the  Example  13.3.4,  show  that  the  vectors 
A  =  (A0,  A)  is  indeed  an  element  in  the  dual  space  M4*  to  M 4  with  respect  to  the 
dual  basis  B*  to  B.  From  the  Proposition  13.1.2  we  see  also  that  the  matrix  elements 
F  transform  as  the  entries  of  a  quadratic  form  in  M4*  (although  F  is  antisymmetric). 
All  this  means  that  the  components  of  the  electro-magnetic  fields  E,  B  are  the  entries 
of  an  antisymmetric  matrix  F  which  transform  as  a  ‘contravariant’  quadratic  form 
under  (proper  and  orthochronous)  Lorentz  transformations. 


Chapter  14 

Affine  Linear  Geometry 


® 

Check  for 
updates 


14.1  Affine  Spaces 

Intuitively,  an  affine  space  is  a  vector  space  without  a  ‘preferred  origin’,  that  is  as 
a  set  of  points  such  that  at  each  of  these  there  is  associated  a  model  (a  reference) 
vector  space. 

Definition  14.1.1  The  real  affine  space  of  dimension  n,  denoted  by  A77  (R)  or  simply 
An ,  is  the  set  M77  equipped  with  the  map 

a  :  An  x  An  ->  R” 

given  by 

ct((hq,  . . . ,  anf  (fi,  •  •  •  5  brf)')  —  (^1  a\,  ... ,  bn  an ) • 

Notice  that  the  domain  of  a  is  the  cartesian  product  of  M77  x  R”,  while  the  range 
of  a  is  the  vector  space  M77.  The  notation  A77  stresses  the  differences  between  an 
affine  space  structure  and  a  vector  space  structure  on  the  same  set  M77.  The  ft -tuples 
of  A77  are  called  points. 

By  A1  we  have  the  affine  real  line ,  by  A2  the  affine  real  plane ,  by  A3  the  affine 
real  space.  There  is  an  analogous  notion  of  complex  affine  space  A77(C),  modelled 
on  the  vector  space  C77 . 

Remark  14.1.2  The  following  properties  for  A77  easily  follows  from  the  above  defi¬ 
nition: 

(pi)  for  any  point  P  e  A77  and  for  any  vector  u  g  R71,  there  exists  a  unique  point  Q 
in  A77  such  that  a(P,  Q )  =  v, 

(p2)  for  any  triple  P,  Q,  R  of  points  in  A77,  it  holds  that  a(P ,  Q )  +  cx(Q,  R)  = 
a(P ,  R). 
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R 


Fig.  14.1  The  sum  rule  (Q  —  P)  +  (R  —  Q)  =  R  —  P 

The  property  (p2)  amounts  to  the  sum  rule  of  vectors  (see  Fig.  14.1). 

Remark  14.1.3  Given  the  points  P,  Q  e  A77  and  the  definition  of  the  map  a,  the 
vector  a(P,  Q )  will  be  also  denoted  by 


v  =  a(P ,  Q)  =  Q  -  P. 


Then,  from  the  property  (pi),  we  shall  write 

Q  =  p  +  v. 

And  property  (p2),  the  sum  rule  for  vectors  in  M77,  is  written  as 

(Q-P)  +  (R-Q)  =  R-P. 

Remark  14.1.4  Given  an  affine  space  A77,  from  (p2)  we  have  that 

(a)  for  any  P  e  An  it  is  a(P,  P)  =  0r«  (setting  P  =  Q  =  R ), 

(b)  for  any  pair  of  points  P,  Q  e  An  it  is  (setting  R  =  P),  a(P,  Q )  =  —a(Q,  P)  . 

A  reference  system  in  an  affine  space  is  given  by  selecting  a  point  O  e  An  so 
that  from  (pi)  we  have  a  bijection 

a0  :An  ->  W\  a0(P)  =  a(0,  P)  =  P-O,  (14.1) 

and  then  a  basis  B  =  (i>i ,  . . . ,  vn)  for  M77 . 

Definition  14.1.5  The  datum  (O,  B)  is  called  an  affine  coordinate  system  or  an 
affine  reference  system  for  A77  with  origin  O  and  basis  B.  With  respect  to  a  reference 
system  (O,  B)  for  A77 ,  if 

P  -  o  =  (*1,  . .  .,xn)B  =  x\Vi  H - h  xnvn 

we  call  (x\,...,xn)  the  coordinates  of  the  point  P  e  A77  and  often  write 
P  =  (x\,  . . . ,  xn).  If  £  is  the  canonical  basis  for  M77,  then  (0,8)  is  the  canonical 
reference  system  for  A77 . 
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Remark  14.1.6  Once  an  origin  has  been  selected,  the  affine  space  A77  has  the  struc¬ 
tures  of  M77  as  a  vector  space.  Given  a  reference  system  ( 0,B )  for  A77,  with 
B  =  (b\, ...  ,bn),  the  points  At  in  A77  given  by 

Ai  =  O  +  b[ 

for  i  =  1 are  called  the  coordinate  points  of  A77  with  respect  to  B.  They  have 
coordinates 


A\  =  (1,  0, . . . ,  0 )#,  A2  =  (0,  1, ... ,  0)b,  •  •  •  An  =  (0,  0, . . . ,  1)#. 

With  the  canonical  basis  £  =  (e\,  ... ,  £w),forM77  the  coordinates  points  At  =  O  +  e* 
will  have  coordinates 


Ai  =  (1,0,...,  0),  A2  =  (0,  1 . 0),  ...  A„  =  (0,  0, . . . ,  1). 

Definition  14.1.7  With  w  el”,  the  map 

Tw  :  A77  ->  A77,  Tw(P)  =  P  +  w. 

is  called  the  translation  of  A77  aAwg  u;s. 

It  is  clear  that  Tw  is  a  bijection  between  A77  and  itself,  since  71  w  is  the  inverse 
map  to  Tw .  Once  a  reference  system  has  been  introduced  in  A77 ,  a  translation  can  be 
described  by  a  set  of  equations,  as  the  following  exercise  shows. 

Exercise  14.1.8  Let  us  fix  the  canonical  cartesian  coordinate  system  (0,8)  for  A3, 
and  consider  the  vector  w  =  (1,  — 2,  1).  If  P  =  (x,  y,  z)  e  A3,  then 
P  —  O  =  xe\  +  y^2  +  ze 3  and  we  write 

Tw(P)~0  =  (P  +  w)-0 
=  (P  —  O)  +  w 

=  (xei  +  y^2  +  ze 3)  +  Oi  -  2e2  +  ef) 

—  (x  +  l)^i  +  (y  —  2)^  +  (z  +  1)^3? 

so  Tw((x,  y,z ))  =  (x  +  1,  y  -  2,  z  +  l). 

Following  this  exercise,  it  is  easy  to  obtain  the  equations  for  a  generic  translation. 
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Proposition  14.1.9  Let  A77  be  an  affine  space  with  the  reference  system  (O,  B).  With 
a  vector  w  =  (w  i,  . . . ,  wn)B  in  R",  the  translation  Tw  has  the  following  equations 

L'w  ( (A  1  ?  •  •  •  ?  )  15 )  (A  1  "F  l ,  •  •  •  ?  "F  )  15  • 

Remark  14.1.10  The  translation  Tw  induces  an  isomorphism  of  vector  spaces  : 
M77  M77  given  by 

P-0  TW(P)  —  Tw(0). 

It  is  easy  to  see  that  f  is  the  identity  isomorphism.  By  fixing  the  orthogonal  carte¬ 
sian  reference  system  (0,8)  for  A77,  with  corresponding  coordinates  (x\,  . . . ,  xn) 
for  a  point  P,  and  a  vector  w  =  w\e \  +  •  •  •  +  wnen,  we  can  write 


Rn  3  P  —  O  —  X\e\  xnen 


and 


TW(P)  =  (xi  +  wu...,xn  +  wn),  Tw(0)  =  (w  i,  . . . ,  wn), 
so  that  we  compute 

TW(P)  -  Tw(0)  =  ( TW(P )  -O)-  ( Tw(0 )  -  O) 

—  ((At  +  +  •  •  •  (xn  +  wn)en^  —  (w\e\  +  •  •  •  +  wnen) 

—  X\e\  T  •  •  •  T  Xn^n  —  P  —  O . 

More  precisely,  such  an  isomorphism  is  defined  between  two  distinct  copies  of  the 
vector  space  M7\  those  associated  to  the  points  O  and  O'  =  Tw(0)  in  A77  thought  of 
as  the  origins  of  two  different  reference  systems  for  A77 .  This  is  depicted  in  Fig.  14.2. 


Fig.  14.2  The  translation  Tw 
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From  the  notion  of  vector  line  in  M2,  using  the  bijection  ao  :  A2  i->  M2,  given  in 
(14.1),  it  is  natural  to  define  a  (straight)  line  hy  the  origin  the  subset  in  A2  that 
corresponds  to  C(v)  in  M2. 

Exercise  14.2.1  Consider  v  =  (1,2)  e  M2.  The  corresponding  line  by  the  origin  in 
A2  is  the  set 

{P  e  A2  :  (P  -  O)  e  C(v)}  =  {(. x ,  y)  =  A(l,  2),  A  e  R}. 

Based  on  this,  we  have  the  following  definition. 

Definition  14.2.2  A  (straight)  line  by  the  origin  in  A”  is  the  subset 

r0  =  {P  e  A”  :  (P-O)e  C(v)} 

for  a  vector  v  e  M77\{0}.  The  vector  v  is  called  the  direction  vector  of  r0. 

Using  the  identification  between  A77  and  M77  given  in  (14.1)  we  write 

r0  =  {P  e  A77  :  P  =  An,  A  e  R}, 

or  even 

r0  :  P  =  Xv  ,  AeM. 

We  call  such  an  expression  the  vector  equation  for  the  line  r<j .  Once  a  reference 
system  (0,8)  for  A77  is  chosen,  via  the  identification  of  the  components  of  P  —  O 
with  respect  to  B  with  the  coordinates  of  a  point  P,  we  write  the  vector  equation 
above  as 

ro  :  (x\,  . . . ,  xn)  =  A(iq,  . . . ,  vn)  ,  with  A  e  R 
with  v  =  (v\,  . . . ,  vn)  providing  the  direction  of  the  line. 

Remark  14.2.3  It  is  clear  that  the  subset  r0  coincides  with  C(v),  although  they 
belong  to  different  spaces,  that  is  ro  C  A77  while  C(v)  C  M77.  With  such  a  caveat, 
these  sets  will  be  often  identified. 

Exercise  14.2.4  The  line  ro  in  A3  with  direction  vector  v  =  (1,  2,  3)  has  the  vector 
equation, 

r0  :  (x,  y,  z)  =  A(l,  2,  3),  A  e  R. 

Exercise  14.2.5  Consider  the  affine  space  A2  with  the  orthogonal  reference  system 
(O,  £).  The  subset  given  by 


r  =  {(*>  y)  =  (1,  2)  +  A(0,  1),  AeR) 
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clearly  represents  a  line  that  runs  parallel  to  the  second  reference  axis.  Under  the 
translation  Tu  with  u  =  (—  1,  —  2)  we  get  the  set 

Tu(r)  =  {P  +  u,  Per} 

=  {(x,y)  =  A(0,  1),  A  e  M}, 

which  is  a  line  by  the  origin  (indeed  the  second  axis  of  the  reference  system).  If 
r0  =  Tu(r ),  a  line  by  the  origin,  it  is  clear  that  r  =  Tw(r0 ),  with  w  =  —u. 

This  exercise  suggests  the  following  definition. 

Definition  14.2.6  A  set  r  c  A77  is  called  a  line  if  there  exist  a  translation  Tw  in  A77 
and  a  line  r0  by  the  origin  such  that  r  =  Tw(r0 ). 

Being  the  sets  ro  and  jC(v)  in  M77  coincident,  we  shall  refer  to  C(v)  as  the  direction 
of  r,  and  we  shall  denote  it  by  Sr  (with  the  letter  S  referring  to  the  fact  that  C(v)  is 
a  vector  subspace  in  M77).  Notice  that,  for  a  line,  it  is  dim(5'r)  =  1. 

The  equation  for  an  arbitrary  line  follows  easily  from  that  of  a  line  by  the  origin. 
Let  us  consider  a  line  by  the  origin, 

r0  :  P  =  Xv  ,  A  e  M, 


and  the  translation  Tw  with  w  e  M77.  If  w  =  Q  —  O,  the  line  r  =  Tw(ro)  is  given  by 


r 


{P  e  A77  :  P  =  Tw(Po ),  Po  e  r0} 
{P  e  An  :  P  =  Q  +  Xv,  A  e  M}, 


so  we  write 

r  :  P  =  Q  +  Xv. 


(14.2) 


With  respect  to  a  reference  system  ( 0,6 ),  where  Q  =  (q\,  . . . ,  qn)B  and 
v  =  (y\,  ...  ,vn)&,  the  previous  equation  can  be  written  as 


r  .  (-G ,  . . . ,  xn )  —  (t/i?  •  •  • ,  C[n)  H-  A(ui,  . . . ,  vn ), 


or  equivalently 


r  : 


x\  =  q\  +  Aiq 


—  qn  ~\~  Xvn 


(14.3) 


(14.4) 


Definition  14.2.7  The  expression  (14.2)  (or  equivalently  14.3)  is  called  the  vector 
equation  of  the  line  r,  while  the  expression  (14.4)  is  called  the  parametric  equation 
of  r  (stressing  that  A  is  a  real  parameter). 
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Fig.  14.3  The  translation  Twt  with  w'  —  vu  e  £(v)  maps  r  into  ro 


Remark  14.2.8  Consider  the  line  whose  vector  equation  isr  :  P  =  Q  +  \v. 

(a)  We  have  a  unique  point  in  r  for  each  value  of  A,  and  selecting  a  point  of  r  gives 
a  unique  value  for  A.  The  point  in  r  is  Q  if  and  only  if  A  =  0; 

(b)  The  direction  of  r  is  clearly  the  vector  line  C(v).  This  means  that  the  direction 
vector  v  is  not  uniquely  determined  by  the  equation,  since  each  element  vr  e  C(v) 
is  a  direction  vector  for  r.  This  arbitrariness  can  be  re-absorbed  by  a  suitable 
rescaling  of  the  parameter  A:  with  a  rescaling  the  equation  for  r  can  always  be 
written  in  the  given  form  with  v  its  direction  vector. 

(c)  The  point  Q  is  not  unique.  As  the  Fig.  14.3  shows,  if  Q  =  O  +  w  is  a  point  in 
r,  then  any  translation  Tw?  with  w'  —  w  e  C(v)  maps  r  into  the  same  line  by 
the  origin. 

Exercise  14.2.9  We  check  whether  the  following  lines  coincide: 

r  :  (x,  y)  =  (1,  2)  +  A(l,  — 1), 

r':  (x,  y)  =  (2,  1)  +  fi(l,  — 1). 

Clearly  r  and  r'  have  the  same  direction,  which  is  Sr  =  Sr>  =  £((1,  —  1))  =  r o . 

If  we  consider  Q  =  (1,  2)  e  r  and  Qr  =  (2,  1)  e  r'  with  w  =  Q  —  O  =  (1,  2)  and 

w'  =  Q!  —  O  =  (2,  1)  we  compute, 

r  =  Tw(r0 ),  r  =  Tw^r0). 

We  have  that  r  coincides  with  r'\  as  described  in  the  remark  above, 

w-w'  =  (-1,  1)  €  £((  1,  -1)). 

In  analogy  with  the  definition  of  affine  lines,  one  defines  planes  in  An . 

Definition  14.2.10  A  plane  through  the  origin  in  An  is  any  subset  of  the  form 

t t0  ={P  e  An  :  (P  -  O)  e  C{u ,  v)}, 


with  two  linearly  independent  vectors  u ,  v  e  W1 . 
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With  the  usual  identification  of  a  point  P  e  An  with  its  image  a(P)  e  W1 
(see  14.1),  we  write 


no  =  {P  G  A'7  :  P  =  Xu  +  nv,  A,/i  g  R}, 


or  also 


it o  :  P  =  Xu  +  (iv 


with  A,  n  real  parameters. 

Definition  14.2.11  A  subset  7r  c  An  is  called  a  p/awe  if  there  exist  a  translation 
map  in  An  and  a  plane  no  through  the  origin  such  that  n  =  Tw(no ).  Since  we 
can  identify  the  elements  in  no  with  the  vectors  in  C(u ,  n)  C  R",  generalising  the 
analogue  Definition  14.2.6  for  a  line,  we  define  the  space  Sn  =  C(u,  v )  to  be  the 
direction  of  n.  Notice  that  dim^)  =  2. 

If  Q  =  Tw(0 ),  that  is  u;  =  Q  —  O ,  the  points  Pan  are  characterised  by 

.P  =  (2  -f  Aw  +  /JjV .  (14.5) 

Let  (0,3)  be  a  reference  system  for  An.  If  Q  =  (q\,  . . . ,  qn)s  £  A",  with 
w  =  («!,...,«„)#  and  n  =  (ui,  . . . ,  vn)s  a  R",  the  above  equation  can  be  written 
as 

|*i  =  q\  +  Awi  +  iiv  i 

'  .  (14.6) 

~  H-  AWy  H-  fivn 

The  relation  (14.5)  is  the  vector  equation  of  the  plane  n,  while  (14.6)  is  a  parametric 
equation  of  n. 

Exercise  14.2.12  Given  the  linearly  independent  vectors  v\  =  (1,0,1)  and 
V2  =  (1,  —  1,  0)  with  respect  to  the  basis  B  in  M3,  the  plane  no  through  the  origin 
associated  to  them  is  the  set  of  points  P  a  A3  given  by  the  vector  equation 


P  —  Aiiq  +  A2f2,  Ai,  A2  €  R. 


With  the  reference  system  (0,3),  with  P  =  (x,  y,  z)  its  parametric  equations  is 


Ix  —  Ai  +  A2 

A  =  —  A2 
z  =  X\ 
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Exercise  14.2.13  Given  the  translation  Tw  in  A3  with  w  =  (1,-1,  2)  in  a  basis  B, 
the  plane  no  of  the  previous  exercise  is  mapped  into  the  plane  n  =  Tw(no)  whose 
vector  equation  is 

7r  :  P  =  Q  +  Xivi  +  X2v2, 

with  Q  =  Tw(0)  =  (1,  —1,2).  We  can  equivalently  represent  the  points  in  n  as 

7T  :  (v,  y,  z )  =  (1,  -1,  2)  +  Ai(l,  0,  1)  +  A2(l,  -1,  0). 

Exercise  14.2.14  Let  us  consider  the  vectors  iq,  v2  in  M4  with  the  following  com¬ 
ponents 

iq  =  (1,0,  1,0),  v2  =  (2,1,  0,-1) 
in  a  basis  B,  and  the  point  Q  in  A4  with  coordinates 

Q  =  (2,1,  1,2). 

in  the  corresponding  reference  system  (O,  B).  The  plane  n  C  A4  through  Q  whose 
direction  is  Sn  =  C(v\,  n2)  has  the  vector  equation 

7r  :  (xux2,  x3,x4)  =  (2,  1,  1,  2)  +  Ai(l,  0,  1,  0)  +  A2(2,  1,  0,  -1) 


and  parametric  equation 


7r  : 


X\  =  2  +  Ai  +  2A2 
x2  =  1  T  A2 
X3  =  1  +  Ai 
x4  =  2  —  A2 


Remark  14.2. 15  The  natural  generalisation  of  the  Remark  14.2.8  holds  for  planes  as 
well.  A  vector  equation  for  a  given  plane  n  is  not  unique.  If 


7T  :  P  =  Q  +  Xu  +  /j,v 
7 T  !  P  —  Q  ~\~  Xli  H-  flV 


are  two  planes  in  An ,  then 


n  —  n 


Sn  =  S (that  is  £(u,  v)  =  jC(u',  i/)) 

Q  ~  Q'  e  Sn 


Proposition  14.2.16  Given  two  distinct  points  A,  B  in  An  ( with  n  >  2),  there  is  only 
one  line  through  A  and  B.  A  vector  equation  is 


r ab  •  P  —  A  +  A (B  —  A). 
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Proof  Being  A  ^  B,  this  vector  equation  gives  a  line  since  B  —  A  is  a  non  zero 
vector  and  the  set  of  points  P  —  A  is  a  one  dimensional  vector  space  (that  is  the 
direction  is  one  dimensional).  The  equation  rAB  contains  A  (for  A  =  0)  and  B  (for 
A  =  1).  This  shows  there  exists  a  line  through  A  and  B. 

Let  us  consider  another  line  rA  through  A.  Its  vector  equation  will  be  P  =  A  +  fiv, 
with  v  6  R”  and  (i  a  real  parameter.  The  point  B  is  contained  in  rA  if  and  only  if 
there  exists  a  value  po  of  the  parameter  such  that  B  =  A  +  /ion,  that  is  B  —  A  =  pov. 
Thus  the  direction  of  rA  would  be^  =  C(v)  =  C(B  —  A)  =  SrAB .  The  line  rA  then 
coincides  with  rAB.  □ 

Exercise  14.2.17  The  line  in  A2  through  the  points  A  =  (1,2)  and  B  =  (1 ,  —2)  has 
equation 

P  =  (x,y)  =  (  1,2)  +  A(0,  -4). 

Exercise  14.2.18  Let  the  points  A  and  B  in  A3  have  coordinates  A  =  (1,  1,  1)  and 
B  =  (1,  2,  —2).  The  line  rAB  through  them  has  the  vector 

(x,  A,  z)  =  (1,  1,  1)  +  A(0,  1,  —3). 

Does  the  point  P  =  (1,  0,  4)  belong  to  rABl  In  order  to  answer  this  question  we 
need  to  check  whether  there  is  a  A  e  R  that  solves  the  linear  system 


l1  =  1 

j  0= l  +  A 
[4  =  1  -  3A. 

It  is  evident  that  A  =  —1  is  a  solution,  so  P  is  a  point  in  rAB. 

An  analogue  of  the  Proposition  14.2.16  holds  for  three  points  in  an  affine  space. 

Proposition  14.2.19  Let  A,  B,  C  be  three  points  in  an  affine  space  An  (with  n  >  3). 
If  they  are  not  contained  in  the  same  line,  there  exists  a  unique  plane  ttABc  through 
them,  with  a  vector  equation  given  by 

7T abc  •  P  =  A  -\-  A (B  —  A)  +  p(C  —  A). 

Proof  The  vectors  B  —  A  and  C  —  A  are  linearly  independent,  since  they  are  not 
contained  in  the  same  line.  The  direction  of  irABc  is  then  two  dimensional,  with 
SnABC  =  C(B  —  A,  C  —  A).  The  point  A  is  in  ttABc,  corresponding  to  P(  A  =  p  =  0); 
the  point  B  is  in  7 tABc,  corresponding  to  P(X  =  1,  pJ  =  0);  the  point  C  is  in  irABc 
corresponding  toP(A  =  0,/i  =  1). 

We  have  then  proven  that  a  plane  through  A,  B,  C  exists.  Let  us  suppose  that 

tt'  :  P  =  A  +  Xu  +  pv. 

gives  a  plane  through  the  points  A,  B,  C  (which  are  not  on  the  same  line)  with  u  and 
v  linearly  independent  (so  its  direction  is  given  by  =  C(u ,  v)).  This  means  that 
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B  —  A  e  C(u ,  v)  and  C  —  A  e  C(u,  v).  Since  the  spaces  are  both  two  dimensional, 
this  reads  jC(B  —  A,  C  —  A)  =  C(u,  v),  proving  that  A  coincides  with  ttabc •  □ 

Exercise  14.2.20  Let  A  =  (1,  2,  0),  B  =  (1,  1,  1)  and  C  =  (0,  1,  —  1)  be  three 
points  in  A3.  They  are  not  on  the  same  line,  since  the  vectors  B  —  A  =  ( 0,  — 1,1) 
and  C  —  A  =  (—1,  — 1,  —  1)  are  linearly  independent.  A  vector  equation  of  the  plane 

^ ABC  IS 

7T  :  (V,  y,  Z )  =  (1,  2,  0)  +  A(0,  -1,  1)  +  /i(- 1,  -1,  -1). 


14.3  General  Linear  Affine  Varieties  and  Parallelism 

The  natural  generalisation  of  (straight)  lines  and  planes  leads  to  the  definition  of  a 
linear  affine  variety  L  in  An ,  where  the  direction  of  L  is  a  subspace  in  of  dimension 
greater  than  2. 

Definition  14.3.1  A  linear  affine  variety  of  dimension  k  in  An  is  a  set 

L  =  {P  e  An  :  (P  -  Q)  e  V}, 

where  Q  is  a  point  in  the  affine  space  An  and  V  c  M77  is  a  vector  subspace  of 
dimension  k  in  W1 .  The  vector  subspace  V  is  called  the  direction  of  the  variety  L, 
and  denoted  by  Sl  =  V.  If  V  =  C(v\,  . . . ,  vf),  a  vector  equation  for  L  is 

L  :  P  =  Q  +  A i +  •  •  •  +  \kvk 
for  scalars  Ai, . . . ,  in  M. 

Remark  14.3.2  It  is  evident  that  a  line  is  a  one  dimensional  linear  affine  variety, 
while  a  plane  is  a  two  dimensional  linear  affine  variety. 

Definition  14.3.3  An  linear  affine  variety  of  dimension  n  —  1  in  An  is  called  a 
hyperplane  in  An . 

It  is  clear  that  a  line  is  a  hyperplane  in  A2,  while  a  plane  is  a  hyperplane  in  A3. 

Exercise  14.3.4  We  consider  the  affine  space  A4,  the  point  Q  with  coordinates 
Q  =  (2,  1 ,  1,2)  with  respect  to  a  given  reference  system  (O,  B),  and  the  vector  sub¬ 
space  S  =  C(v i,  r>2,  V3)  in  M4  with  generators  v\  =  (1,0,  1,  0),  V2  =  (2,  1,  0,  —  1), 
v3  =  (0,  0,  —1,  1)  with  respect  to  B.  The  vector  equation  of  the  linear  affine  variety 
L  with  direction  SL  =  C(v\,  V2,  v3)  and  containing  Q  is 


L  :  (xi,x2,X3,  X4)  —  (2,  1,  1,  2)  +  Ai(l,  0,  1 ,  0)  H-  A2 (2,  1,  0,  —  1)  +  A3(0,  0,  —1,  1), 
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while  its  parametric  equation  reads 


x\  —  2  +  Ai  +  2A2 
x2  =  1  +  A2 
x3  =  1  +  Ai  -  A3  ’ 

x\  =  2  A2  A3 

Definition  14.3.5  Let  L ,  L'  be  two  linear  affine  varieties  of  the  same  dimension  k 
in  A77.  We  say  that  L  is  parallel  to  L ’  if  they  have  the  same  directions,  that  is  if 
Sl  =  Su  • 

Exercise  14.3.6  Let  L  o  C  A"  be  a  line  through  the  origin.  A  line  L  in  An  is  parallel 
to  Lo  if  and  only  if  L  =  Tw(Lo),  for  w  e  M77.  From  the  Remark  14.2.15  we  know 
that  L  =  L  o  if  and  only  if  w  e  SL. 

Let  us  consider  the  line  through  the  origin  in  A2  given  by  L  o  :  (x ,  y)  =  A  (3,  —  2) . 
A  line  will  be  parallel  to  L  o  if  and  only  if  its  vector  equation  is  given  by 

L  :  (A,  y)  =  (a,  (3)  +  A(3,  -2), 

with  ( a ,  /3)  gR2.  The  line  L  is  moreover  distinct  from  L'  if  and  only  if  (a,  /3)  £  SL. 

Definition  14.3.7  Let  us  consider  in  An  a  linear  affine  variety  L  of  dimension  k  and 
a  second  linear  affine  variety  L'  of  dimension  kf,  with  k  >  kr .  The  variety  L  is  said 
to  be  parallel  to  L'  if  Su  C  Sl,  that  is  if  the  direction  of  L'  is  a  subspace  of  the 
direction  of  L. 

Exercise  14.3.8  Let  us  consider  in  A3  the  plane  given  by 

7T  :  (v,  y,  z )  =  (0,  2,  -1)  +  A^l,  0,  1)  +  A2(0,  1,  1). 

We  check  whether  the  following  lines, 

n  :  (x,y,z)  =  A(l,  -1,0) 

r2:  (x,  y,  z)  =  (0,  3,  0)  +  A(l,  1,  2) 

r3:  (x,  y,  z)  =  (1,  —  1,  1)  +  A(1,  1,  1), 


are  parallel  to  7 r. 

If  denotes  the  direction  of  7r,  we  clearly  have  that  Sn  =  C  W2)  =  A((1,0,  1), 

(0,  1,  1)),  while  we  denote  by  vt  a  vector  spanning  the  direction  Sn  of  the  line 
G,  i  =  1,  2,  3.  To  verify  whether  Sn  C  Sn  it  is  sufficient  to  compute  the  rank  of  the 
matrix  whose  rows  are  given  by  (uq,  u>2,  g  ). 

•  For  i  =  1 ,  after  a  reduction  procedure  we  have, 


(Wi 

W2 

Vl 
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Since  this  matrix  has  rank  2,  we  have  that  v\  e  C(w\,  w 2),  that  is  Sn  c  Sn.  We 
conclude  that  r\  is  parallel  to  7r.  One  also  checks  that  r\  <f_  7 r,  since  (0,  0,  0)  G  r\ 
but  (0,  0,  0)  ^  7T.  To  show  this,  one  notices  that  the  origin  (0,  0,  0)  is  contained  in 
7 r  if  and  only  if  the  linear  system 


r  0  =  ax 

(0,  0,  0)  =  (0,  2,  -1)  +  Ai(l,  0,  1)  +  A2(0,  1,1)  =>  ]  0  =  2  +  A2 

[  0  =  —  1  +  Ai  +  A2 


has  a  solution.  It  is  evident  that  such  a  solution  does  not  exist. 

•  For  i  =  2  we  proceed  as  above.  The  following  reduction  by  rows 

wA  /I  0  l\  /I  0  1\ 

w2  =  0  1  1  i->  Oil 

v2J  \1  1  2/  \°  1  1/ 

shows  that  v2  G  C{w\ ,  w2),  thus  r2  is  parallel  to  7r.  Now  r2  C  tt:  a  point  P  is  in  r2 
if  and  only  there  exists  a  A  g  R  such  that  P  =  (A,  A  +  3,  2A).  For  any  value  of  A, 
the  linear  system 


r  a  =  Ai 

(A,  A  +  3,  2A)  =  (0,  2,  -1)  +  AAl,  0,  1)  +  A2(0,  1,1)  =►  \  A  +  3  =  2  +  A2 

[  2A 


has  the  unique  solution  Ai  =  A,  A2  =  A  H-  1. 

•  For  i  =  3  the  following  reduction  by  rows 

(wA  (1  0  i\  (\  0  i\ 

U2  =  011  h*  011 

\v3J  \1  1  1  /  V010/ 

shows  that  the  matrix  t(w  1,  w2,  v3)  has  rank  3,  so  r3  is  not  parallel  to  7r. 

Definition  14.3.9  Let  L,  L'  c  An  two  distinct  linear  affine  varieties.  We  say  that  L 
and  L ’  are  incident  if  their  intersection  is  non  empty,  while  they  are  said  to  be  skew 
if  they  are  neither  parallel  nor  incident. 

Remark  14.3.10  It  is  easy  to  see  that  two  lines  (or  a  line  and  a  plane)  are  incident 
if  they  have  a  common  point.  Two  distinct  planes  in  An  (with  n  >  3)  are  incident  if 
they  have  a  common  line. 


Exercise  14.3.11  In  the  affine  space  A3  we  consider  the  line  r3  and  the  plane  it 
as  in  the  Exercise  14.3.8.  We  know  already  that  they  are  not  parallel,  and  a  point 
P  =  (x ,  y ,  z)  belongs  to  the  intersection  r3  Fi  tt  if  and  only  if  there  exists  a  A  such  that 
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P  =  (1  +  A,  —  1  +  A,  1  +  A)  €  r2  and  there  exist  scalars  Ai,  A2  such  that 
P  =  (Ai,  2  +  A2,  —  1  +  Ai  +  A2)  G  7 r.  These  conditions  are  equivalent  to  the  linear 
system 

f  1  +  A  =  A, 
j  -1  +  A  =  2  +  A2 
[  1  +  A  =  — I  +  A1  +  A2 

that  has  the  unique  solution  (A  =  4,  Ai  =  5,  A2  =  1).  This  corresponds  to 
P  =  (5,  3,  5)  e  r3  n  7 r. 

Exercise  14.3.12  Consider  again  the  lines  r\  and  r 2  in  the  Exercise  14.3.8.  We  know 
they  are  not  parallel,  since  v\  £  C(v 2).  They  are  not  incident:  there  are  indeed  no 
values  of  A  and  fi  such  that  a  point  P  =  A(l,  —  1,  0)  in  r\  coincides  with  a  point 
P  =  (0,  3,  0)  +  /i(l,  1,  2)  in  r2,  since  the  following  linear  system 


I  A  =  ii 
— A  —  3  T  fi 

0  =  2/i 


has  no  solution.  Thus  r\  and  r2  are  skew. 

Exercise  14.3.13  Given  the  planes 

7T  :  (v,  y,  z )  =  (0,  2,  -1)  +  A^l,  0,  1)  +  A2(0,  1,  1) 

tt'  :  (v,  y,  z )  =  (1,  -1,  1)  +  Ai(0,  0,  1)  +  A2(2,  1,  -1) 

in  A3,  we  determine  all  the  lines  which  are  parallel  to  both  1 r  and  tt'. 

We  denote  by  r  a  generic  line  satisfying  such  a  condition.  From  the  Defini¬ 
tion  14.3.5,  we  require  that  Sr  c  Sn  H  for  the  direction  Sr  of  r.  Since 

Sn  =  £((1,0,  1),  (0,  1,  1))  while  =  £((0,0,  1),  (2,  1,  —1)),  in  order  to  compute 
Sn  n  Sn>  we  write  the  condition 

a(  1,  0,  1)  +  >9(0,  1,  1)  =  a\0,  0,  1)  +  2,  1,  -1) 


as  the  linear  homogeneous  system  for  (a,  (5,  a',  /?')  given  by  E  :  AX  =  0  with 


/I  0  0  2  \ 
010  1  , 
\i 1 1  -v 


/  a\ 

P 

—a' 

V-W 


The  space  of  solution  for  such  a  linear  system  is  easily  found  to  be 

=  {(a,  (3,  -a’,  -/?')  =  t( 2, 1,  -4,-1)  :  te  R), 
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so  we  have  that  the  intersection  Sn  n  Snr  is  one  dimensional  and  spanned  by  the 
vector 

2(1,  0,  1)  +  (0,  1,  1)  =  4(0,  0,  1)  +  (2,  1,  -1)  =  (2,  1,  3). 

This  gives  that  Sr  =  £((2,  1,  3)),  so  we  finally  write 

r  :  (x,  y,  z)  =  (< a ,  b,  c)  +  A(2,  1,  3). 

for  an  arbitrary  (a,  b,  c)  e  A3. 


14.4  The  Cartesian  Form  of  Linear  Affine  Varieties 

In  the  previous  sections  we  have  seen  that  a  linear  affine  variety  can  be  described 
either  with  a  vector  equation  or  a  parametric  equation.  In  this  section  we  relate  linear 
affine  varieties  to  systems  of  linear  equations. 

Proposition  14.4.1  A  linear  affine  variety  L  c  An  corresponds  to  the  space  of  the 
solutions  of  an  associated  linear  system  with  m  equations  in  n  unknowns,  that  is 

£L  :  AX  =  B ,  for  A  e  Mm,n.  (14.7) 

Moreover,  the  space  of  solutions  of  the  corresponding  homogeneous  linear  system 
describes  the  direction  space  Si  =  Lo  of  L,  that  is 

:  AX  =  0. 

We  say  that  the  linear  system  £L  given  in  (14.7)  is  the  cartesian  equation  for 
the  linear  affine  variety  L  of  dimension  n  —  rk(A).  By  computing  the  space  of  the 
solutions  of  TjL  in  terms  of  n  —  rk(A)  parameters,  one  gets  the  parametric  equation 
for  L.  Conversely,  given  the  parametric  equation  of  L,  its  corresponding  cartesian 
equation  is  given  by  consistently  ‘eliminating’  all  the  parameters  in  the  parametric 
equation.  This  linear  affine  variety  can  be  represented  both  by  a  cartesian  equation 
and  by  a  parametric  equation,  which  are  related  as 

linear  system  £  :  AX  =  B  _  space  of  the  solutions  for  £  :  AX  =  B 

(i cartesian  equation)  ( parametric  equation) 

Notice  that  for  a  linear  affine  variety  L  a  cartesian  equation  is  not  uniquely  deter¬ 
mined:  any  linear  system  £;  which  is  equivalent  to  £^  (that  is  for  which  the  spaces 
of  the  solutions  for  £L  and  £;  coincide)  describe  the  same  linear  affine  variety.  An 
analogue  result  holds  for  the  direction  space  of  L,  which  is  equivalently  described 
by  any  homogenous  linear  system  £/0  equivalent  to  £^0 . 
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We  avoid  an  explicit  proof  of  the  Proposition  14.4.1  in  general,  and  analyse  the 
equivalence  between  the  two  descriptions  via  the  following  examples. 

Exercise  14.4.2  Let  us  consider  the  line  r  C  A2  with  parametric  equation 


x  —  1  H-  A 

T  ' 

‘  y  =  2  —  A 

We  can  express  the  parameter  A  in  terms  of  v  from  the  first  relation,  that  is 
A  =  v  —  1 ,  and  replace  this  in  the  second  relation,  having 


v  +  y  —  3  =  0. 


We  set 

s  =  {(x,  y)  e  A2  :  v  +  y  —  3  =  0} 

and  show  that  s  coincides  with  r.  Clearly  r  c  s,  since  a  point  with  coordinates 
(1  +  A,2  —  A)  e  r  solves  the  linear  equation  for  s: 

(1  +  A)  +  (2  -  A)  -  3  =  0. 

In  order  to  prove  that  s  c  r,  consider  a  point  P  =  (x,  y)  e  s,  so  that 
P  =  (x,  y  =  3  —  x)  for  any  value  of  x:  this  means  considering  x  as  a  real  parame¬ 
ter.  By  writing  A  =  x  —  1 ,  we  have  P  =  (x  =  A  +  1 ,  y  =  2  —  A)  for  any  A  e  I,  so 
Per.  We  have  then  s  =  r  as  linear  affine  varieties. 

Proposition  14.4.3  Given  a ,  b,  c  in  R  with  (a,  b)  ^  (0,  0),  the  solutions  of  the  equa¬ 
tion 

Sr  :  ax  +  by  +  c  =  0  (14.8) 

provide  the  coordinates  of  all  the  points  P  =  (x,y)  of  a  line  r  in  A2  whose  direc¬ 
tion  Sr  =  C((—b,  a))  is  given  by  the  solutions  of  the  associated  linear  homogenous 
equation 

Xro  :  ax  +  by  =  0. 

Moreover,  ifr  C  A2  is  a  line  with  direction  Sr  =  C((—b,  a)),  then  there  exists  cel 
such  that  the  cartesian  form  for  the  equation  of  r  is  given  fry  (14.8). 

Proof  We  start  by  showing  that  the  solutions  of  (14.8)  give  the  coordinates  of  the 
points  representing  the  line  with  direction  C((—b,  a))  in  parametric  form. 

Let  us  assume  a  7^  0.  We  can  then  write  the  space  of  the  solutions  for  (14.8)  as 

b  c 

(x,y)  =  ( - b - ,A0 

a 


a 
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where  fi  e  R  is  a  parameter.  By  rescaling  the  parameter,  that  is  defining  A  =  fi/ a , 
we  write  the  space  of  solutions  as  the  points  having  coordinates, 

c 

O,  y)  =  (-b A - ,  aX) 

a 

=  ( - ,  0)  +  \(—b,  a). 

a 

This  expression  gives  the  vector  (and  the  parametric)  equation  of  a  line  through 
(—c/a,  0)  with  direction  Sr  =  C((—b,  a)). 

If  a  =  0,  we  can  write  the  space  of  the  solutions  for  (14.8)  as 

c 

(x,  y)  =  (jjl,  -  -) 

b 

where  /i  £  R  is  a  parameter.  By  rescaling  the  parameter,  we  can  write 

(x,y)  =  (-A  b,-j)  =  (0,  — 7)  +  A(— Z?,  0), 

b  b 

giving  the  vector  equation  of  a  line  through  the  point  (0,  —c/b)  with  direction 
Sr  =  C((-b,  0)). 

Now  let  r  be  a  line  in  A2  with  direction  Sr  =  C((—b,  a)).  Its  parametric  equation 
is  of  the  form 

v  =  X()  —  bX 
r  ’ 

y  =  yo  +  a\ 

where  (x0 ,  yo)  is  an  arbitrary  point  in  A2.  If  a  A  0,  we  can  eliminate  A  by  setting 

A  =  ^ 

a 

from  the  second  relation  and  then 

b 

X  =  x0 - (y  -  Jo), 

a 

resulting  into  the  linear  equation 


ax  +  by  +  c  =  0 


with  c  =  —(ax o  +  byo). 

If  a  =  0  then  b  ^  0,  so  by  rescaling  the  parameter  as  /i  =  Vo  —  A b,  the  points  of 
the  line  r  are  (x  =  fi,  y  =  yo)-  This  is  indeed  the  set  of  the  solutions  of  the  linear 
equation 


ax  +  by  +  c  =  0 
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with  a  =  0  and  c  =  —byo.  We  have  then  shown  that  any  line  with  a  given  direction 
has  the  cartesian  form  given  by  a  suitable  linear  equation  (14.8).  □ 

The  equation  ax  +  by  +  c  =  0  is  called  the  cartesian  equation  of  a  line  in  A2. 

Remark  14.4.4  As  already  mentioned,  a  line  does  not  uniquely  determine  its  carte¬ 
sian  equation.  With  ax  +  by  +  c  =  0  the  cartesian  equation  for  r,  any  other  linear 
equation 

pax  +  pby  +  pc  =  0,  with  p  7^  0 

yields  a  cartesian  equation  for  the  same  line,  since 

pax  +  pby  +  pc  =  0  p(ax  +  by  +  c)  =  0  o  ax  +  by  +  c  =  0. 

Exercise  14.4.5  The  line  Er  :  2x  —  y  +  3  =  0  in  A2  has  direction  £rn  :  2x  —  y  =  0, 
or  Sr  =  £((1,2)). 

Exercise  14.4.6  We  turn  now  to  the  description  of  a  plane  in  the  three  dimensional 
affine  space  in  terms  of  a  cartesian  equation.  Let  us  consider  the  plane  n  C  A3  with 
parametric  equation 

Ix  =  1  T  2X  T  p 
y  =  2  —  A  —  p  . 
z  =  p 

We  eliminate  the  parameter  p  by  setting  p  =  z  from  the  third  relation,  and  write 

Ix  =  1  +  2A  +  z 
y  =  2  —  A  —  z  . 

P  —  z 


We  can  then  eliminate  the  parameter  A  by  using  the  second  (for  example)  relation, 
so  to  have  \  —  2  —  y  —  z  and  write 

Ix  =  1  +  2(2  -  y  -  z)  +  z 
A  =  2  —  y  —  z 
P  =  Z 


Since  these  relations  are  valid  for  any  choice  of  the  parameters  A  and  p,  we  have 
a  resulting  linear  equation  with  three  unknowns: 

:  v  +  2y  +  z  —  5  =  0. 

Such  an  equation  still  represents  7 r,  since  every  point  Pen  solves  the  equation  (as 
easily  seen  by  taking  P  =  (1  +  2A  +  //,2  —  A  —  p,  p))  and  the  space  of  solutions 
of  coincides  with  the  set  n. 
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This  example  has  a  general  validity  for  representing  in  cartesian  form  a  plane  in 
A3.  A  natural  generalisation  of  the  proof  of  the  previous  Proposition  14.4.3  allows 
one  to  show  the  following  result. 

Proposition  14.4.7  Given  a,  b,  c,  d  in  R  with  (a,  b,  c)  ^  (0,  0,  0),  the  solutions  of 
the  equation 

:  ax  +  by  +  cz  +  d  =  0  (14.9) 

provide  the  coordinates  of  all  the  points  P  =  (x,y,  z)  of  a  plane  it  in  A3  whose 
direction  Sn  is  given  by  the  solutions  of  the  associated  linear  homogenous  equation 

X^  :  ax  +  by  +  cz  =  0.  (14.10) 

If  7T  C  A3  is  a  plane  with  direction  Sn  =  M2  given  by  the  space  of  the  solutions 
of  (14.10),  then  there  exists  d  e  R  such  that  the  cartesian  form  for  the  equation  of 
7 r  is  given  by  (14.9). 

The  equation 

ax  +  by  +  cz  +  d  =  0 

is  called  the  cartesian  equation  of  a  plane  in  A3. 

Remark  14.4.8  Analogously  to  what  we  noticed  in  the  Remark  14.4.4,  the  cartesian 
equation  of  a  plane  tt  in  A3  is  not  uniquely  determined,  since  it  can  be  again  multiplied 
by  a  non  zero  scalar. 

Exercise  14.4.9  We  next  look  for  a  cartesian  equation  for  a  line  in  A3.  As  usual, 
by  way  of  an  example,  we  start  by  considering  the  parametric  equation  of  the  line 
r  C  A3  given  by 

Iv  =  1  +  2A 
y  =  2  —  A  . 
z  =  A 

By  eliminating  the  parameter  A  via  (for  example)  the  third  relation  A  =  z  we  have 

Iv  =  1  +  2  z 
y=2-z  . 

A  =  z 


Since  the  third  relations  formally  amounts  to  redefine  a  parameter,  we  write 

[v  —  2z—  1  =  0 
•  ,  on’ 

y  +  z  —  2  =  0 

which  is  a  linear  system  with  three  unknowns  and  rank  2,  thus  having  oo1  solutions. 
In  analogy  with  the  procedure  used  above  for  the  other  examples,  it  is  easy  to  show 
that  the  space  of  solutions  of  Xr  coincides  with  the  line  r  in  A3 . 
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The  following  result  is  the  natural  generalisation  of  the  Propositions  14.4.3  and 
14.4.7. 

Proposition  14.4.10  Given  the  ( complete ,  see  the  Definition  6.1.5)  matrix 


with 


(A,  B)  =  (  ai  f1  Cl  d,l\  e  R2’4 
a2  b2  c2  -d2 


rk(A)  =  rk  |  ai  f 1  Cl  |  =  2, 

a2  b2  c2 


the  solutions  of  the  linear  system 


'7T 


AX  =  B 


a\x  +  b\y  +  ciz  +  d\  = 


0 

0 


(14.11) 


provide  the  coordinates  of  all  the  points  P  =  (x ,  y ,  z)  of  a  line  r  in  A3  whose  direction 
Sr  is  given  by  the  solutions  of  the  associated  linear  homogenous  system 


Ero  :  AX  =  0.  (14.12) 

If  r  C  A3  is  a  line  whose  direction  Sr  =  M  is  given  by  the  space  of  the  solutions 
of  the  linear  homogenous  system  (14.12)  with  A  e  M3,2  and  rk(A)  =  2,  then  there 
exists  a  vector  B  =  f(— d\,  —d2)  such  that  the  cartesian  form  for  the  equation  of  r 
is  given  by  (14.11). 

The  linear  system 


{a\x  +  b\y  +  c\z  +  d\  —  0 
a2x  +  b2y  +  c2z  +  d2  =  0 


with  rk 


fai  b\  c A 
V^2  b2  c2 ) 


2  is  called  the  cartesian  equation  of  the  line 


r  in  A3 . 


Remark  14.4.11  We  notice  again  that  the  cartesian  form  (14.11)  is  not  uniquely 
determined  by  the  line  r,  since  any  linear  system  IT  which  is  equivalent  to  Er 
describes  the  same  line. 


We  now  a  few  examples  of  linear  affine  varieties  described  by  cartesian  equations 
obtained  via  removing  parameters  in  their  parametric  equations. 

Exercise  14.4.12  We  consider  the  hyperplane  in  A4  with  parametric  equation 


x  —  ItAT/^T^ 
y  =  A  -  H 
z  =  p  +  v 

t  —  V 
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Let  us  eliminate  the  parameters:  we  start  by  eliminating  fi  via  the  fourth  relations, 
then  v  by  the  third  relation  and  eventually  A  via  the  second  relation.  We  have  then 

t)  + 1 


t)  +  (z  —  t)  +  t 


As  we  have  noticed  previously,  since  these  relations  are  valid  for  each  value  of 
the  parameters  A,  /i,  v,  the  computations  amount  to  a  redefinition  of  the  parameters 
to  y,  z,  t,  so  we  consider  only  the  first  relation,  and  write 


H 


x  —  1  +  A  +  /i  +  f 

y  =  \  -  H 

z  =  n  +  t 
v  —  t 


x  = 

y  = 

/i  = 


1  +  A  +  (z  ■ 
X-(z-t) 
z  —  t 


V  =  t 


X 

A 

/i 

v 


1  +  (y  +  z 
y  +  z-t 
z  - 1 


=  t 


'Em  :  x  —  y  —  2z  +  t  —  1  =  0 


as  the  cartesian  equation  of  the  hyperplane  H  in  A4  with  the  starting  parametric 
equation.  The  direction  Sh  =  H^3  of  such  a  hyperplane  is  given  by  the  vector  space 
corresponding  to  the  space  of  the  solutions  of  the  homogeneous  linear  equation 


x  —  y  —  2z  +  t  =  0. 

Exercise  14.4.13  We  consider  the  plane  n  in  A3  whose  vector  equation  is  given  by 

7 r  :  P  =  Q  +  Xvi  + 

with  Q  =  (2,  3,  0)  and  v\  =  (1,0,  1),  v2  =  (1,  —1,  0).  By  denoting  the  coordinates 
P  =  ( x ,  y,  z)  we  write 


which  reads  as  the  parametric  equation 


256 


14  Affine  Linear  Geometry 


If  we  eliminate  the  parameters  we  write 


\X  =  z 

H  :  |  n  =  3  —  y 

[x  =  2  +  z  +  3 - y 


so  to  have  the  following  cartesian  equation  for  it: 


'7T 


v  +  y  —  z  —  5  =  0. 


The  direction  Sn  =  R2  of  the  plane  it  is  the  space  of  the  solutions  of  the  homo¬ 
geneous  equation 

v  +  y  -  z  =  0, 


and  it  is  easy  to  see  that  Sn  =  C(v i,  V2). 

Exercise  14.4.14  We  consider  the  liner  :  P  =  Q  +  An  in  A4,  with  Q  =  (1,  — 1,2,  1) 
and  direction  vector  v  =  (1,2,  2,  1).  Its  parametric  equation  is  given  by 

X\  —  1  T  A 
X2  —  2  —  A 
r  :  '  *3  =  2  +  2A  ' 

X4  =  1  T  A 

If  we  use  the  first  relation  to  eliminate  the  parameter  A,  we  write 

r 

A  =  x\  —  1 
X2  =  2  —  (Xi  —  1) 
r  •  x3=2  +  2(xi  -  1) 

v4  =  1  +  (xi  -  1) 

which  amounts  to  the  following  cartesian  equation 


)X\  T  x2  3  =  0 
2x\  +  V3  =  0 
x\  +  X4  =  0 

Again,  the  direction  Sr  =  R  of  the  line  r  is  given  by  the  space  of  the  solutions  for 
the  homogeneous  linear  system 

1x1  -hx2  =  0 

2xi  +  V3  =  0 . 
xi  +  X4  =  0 

It  is  easy  to  see  that  Sro  =  C(v). 
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Exercise  14.4.15  We  consider  the  plane  7r  c  A3  whose  cartesian  equation  is 

:  2x  —  y  +  z  —  1  =  0. 

By  choosing  as  free  unknowns  x,y,  we  have  z  = —2x  +  y  +  1,  that  is 
P  =  (x,  y,  z)  G  7T  if  and  only  if 

(x,  y,  z)  =  ( a ,  b,  -2a  +  b  +  1)  =  (0,  0,  1)  +  a(  1,  0,  -2)  +  Z?(0,  1,  1) 

for  any  choice  of  the  real  parameters  a,  b.  The  former  relation  is  then  the  vector 
equation  of  i r. 

Exercise  14.4.16  We  consider  the  line  r  c  A3  with  cartesian  equation 

y  .  I  x  —  y  +  z  — 1  =  0 
r  !  2jc  +  y  +  2  =  0  ' 

t. 

In  order  to  have  a  vector  equation  for  r  we  solve  such  a  linear  system,  getting 

V  ■  I  >’  =  -2*  -  2 

z  =  —  3v  —  3 

t. 

Then  the  space  of  the  solutions  for  Er  is  given  by  the  elements 

(x,  y,  z )  =  ( a ,  —2a  —  2,  —3a  —  3)  =  (0,  —2,  —3)  +  a(l,  —2,  —3). 


This  relation  yields  a  vector  equation  for  r. 

We  conclude  this  section  by  rewriting  the  Proposition  14.4.1,  whose  formulation 
should  appear  now  clearer. 

Proposition  14.4.17  Given  the  matrix 


(A,B) 


^ an  a\2  . . .  a\n 

a2l  a22  •  •  •  a2 n 


G  R 


m,n 


\am  i  2 


w/f/z 


rk(A)  =  rk 


^ a\\  «12  •  •  •  01n  \ 
a2l  a22  •  •  •  Cl2n 


\ami  £Z/77  2  . . .  amn  J 


=  m  <  n, 
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the  solutions  of  the  linear  system 


XL  :  AX  =  B  <=> 


Cl\\X\  +  ^12-^2  +  *  '  *  +  Cl i nXn  +  b\  —  0 


djnl-^l  “I-  Cijn2-^2  T"  '  '  '  T-  ClmnXn  H-  hm  —  0 


(14.13) 


give  the  coordinates  of  all  points  P  =  (x\ ,  X2,  . . . ,  xn)  of  a  linear  affine  variety  L  in 
An  of  dimension  k  =  n  —  m  and  whose  direction  Si  is  given  by  the  solutions  of  the 
associated  linear  homogenous  system 


ELo  :  AX  =  0.  (14.14) 

IfL  C  An  is  a  linear  affine  variety  of  dimension  k,  whose  direction  Si  =  M.k  is 
the  space  of  solutions  of  the  linear  homogenous  system  AX  =  0  with  A  e  Mm,w  and 
rk(A)  =  m  <  n,  then  there  is  a  vector  B  =  1  (— b\ ,  . . . ,  —bm)  such  that  the  cartesian 
form  for  the  equation  of  L  is  given  by  (14.13). 


14.5  Intersection  of  Linear  Affine  Varieties 


In  this  section,  by  studying  particular  examples,  we  introduce  some  aspects  of  the 
general  problem  of  the  intersection  (that  is  of  the  mutual  position)  of  different  linear 
affine  varieties. 

14.5.1  Intersection  of  two  lines  in  A2 

Let  r  and  r'  be  the  lines  in  A2  given  by  the  cartesian  equations 

Er  :  ax  +  by  +  c  =  0;  Xr/  :  a! x  +  b'y  +  c'  =  0. 


Their  intersection  is  given  by  the  solutions  of  the  linear  system 


By  defining 


ax  A  by  =  —c 
a'x  +  b'y  =  —c’  ’ 


(A,  B) 


(a  b  —c 
a'  V  -c’ 


the  matrices  associated  to  such  a  linear  system,  we  have  three  different  possibilities: 

•  if  rk(A)  =  rk((A ,  Bf)  =  1 ,  the  system  Ernr/  is  solvable,  with  the  space  of  solutions 
S^rnr,  containing  oo1  solutions.  This  means  that  r  =  r' ,  the  two  lines  coincide; 

•  if  rk(A)  =  rk((A ,  B))  =  2,  the  system  Xrnr/  is  solvable,  with  the  space  of  solutions 
Sxrnr,  made  of  only  one  solution,  the  point  P  =  (x 0,  yo)  of  intersection  between 
the  lines  r  and  r'\ 
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•  if  rk(A)  =  1  and  rk((A,  B ))  =  2,  the  system  £rrv  is  not  solvable,  which  means 
that  r  fl  r'  =  0;  the  lines  r  and  r'  are  therefore  parallel,  with  common  direction 
given  by  C((—b,  a)). 

We  can  summarise  such  cases  as  in  the  following  table 


rk(A) 

rk  ((A,  «)) 

r  Dr' 

1 

1 

oor 

r  =  r' 

2 

2 

1 

P  =  (x o,  yo) 

1 

2 

0 

0 

The  following  result  comes  easily  from  the  analysis  above. 

Corollary  14.5.2  Given  the  lines  r  and  r'  in  A2  with  cartesian  equations 
£r  :  ax  +  by  +  c  =  0  and  £r/  :  a'x  +  b'y  +  c'  =  0,  we  have  that 


r 


<= -- => 


rk 


(a  b  —c 

a '  V  —d 


Exercise  14.5.3  Given  the  lines  r  and  s  on  A2  whose  cartesian  equations  are 


£r  !  x  y  —  1  —  0,  Sj.  !  x  H-  2y  2  —  0, 


we  study  their  mutual  position.  We  consider  therefore  the  linear  system 


[x  +  y  = 

|  v  +  2 y  =  —2 


The  reduction 


(A,  B) 


i-^ 


(A7,  Br) 


proves  that  rk(A,  B)  =  rk(A;,  B')  =  2  and  rk(A)  =  rk(A;)  =  2.  The  lines  r  and  s 
have  a  unique  point  of  intersection,  which  is  computed  to  be  r  Li  s  =  {(4,  —3)}. 

Exercise  14.5.4  Consider  the  lines  r  and  sa  given  by  their  cartesian  equations 


Er  :  x  +  y  -  1=0,  Y,Sa  :  x  +  ay  +  2  =  0 


with  a  e  R  a  parameter.  We  study  the  mutual  position  of  r  and  sa  as  depending  on 
the  value  of  a.  We  therefore  study  the  linear  system 


£ 


ms  a  • 


x  +y  =  1 
x  +  ay  =  —2 
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We  use  the  reduction 


(A,B) 


1  1  1  \  /II  1  \ 

l  a -2/  {Oa-1-3/ 


(A',  B'), 


proving  that  rk(  A,  B)  =  rk(Ar,  Bf)  =  2  for  any  value  of  ct,  whilerk(A)  =  rk(Ar)  =  2 
if  and  only  if  a  1 .  This  means  that  r  is  parallel  to  if  and  only  if  a  =  1  (being 
in  such  a  case  ESl  :  v  +  y  +  2  =  0),  while  for  any  a  /  1  the  two  lines  intersects  in 
one  point,  whose  coordinates  are  computed  to  be 


ct  T  2 

rDsa  =  ( - 

a  —  1 


3 


). 


The  following  examples  show  how  to  study  the  mutual  position  of  two  lines  which 
are  not  given  in  the  cartesian  form.  They  present  different  methods  without  the  need 
to  explicitly  transforming  a  parametric  or  a  vector  equation  into  its  cartesian  form. 

Exercise  14.5.5  We  consider  the  line  r  in  A2  with  vector  equation 


r  :  (x,  y)  =  (1,  2)  +  A(l,  — 1), 


and  the  line  s  whose  cartesian  equation  is 

'Eg  :  2x  —  y  —  6  =  0. 

These  line  intersect  for  each  value  of  the  parameter  A  giving  a  point  in  r  whose 
coordinates  solve  the  equation  Es .  From 

x  —  1  T  A 
r  ■ 

'  y  =  2  —  X 

we  have 

2(1  +  A)  -  (2  -  A)  -  6  =  0  A  =  2. 

This  means  that  r  and  s  intersects  in  one  point,  the  one  with  coordinates 
(x  =  3,  y  =  0). 

Exercise  14.5.6  As  in  the  exercise  above  we  consider  the  line  r  given  by  the  vector 
equation 

r  :  (x,  y)  =  (1,  —1)  +  A(2,  — 1) 


and  the  line  s  given  by  the  cartesian  equation 


Es  :  v  +  2y  —  3  =  0. 
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Their  intersections  correspond  to  the  value  of  the  parameter  A  which  solve  the 
equation 

(1  +  2A)  +  2(— 1  —  A)  —  3  =  0  -4  =  0. 

This  means  that  r  Cl  s  =  0;  these  two  lines  are  parallel. 

Exercise  14.5.7  Consider  the  lines  r  and  s  in  A2  both  given  by  a  vector  equation, 
for  example 

r  :  (v,  y)  =  (1,  0)  +  A(l,  -2),  j  :  (*,  y)  =  (1,  -1)  +  1,  1). 

The  intersection  r  P  s  corresponds  to  values  of  the  parameters  A  and  fi  for  which 
the  coordinates  of  a  point  in  r  coincide  with  those  of  a  point  in  s.  We  have  then  to 
solve  the  linear  system 


1  T  A  —  1  —  fi 
— 2A  =  — 1  -T  fi 


A  =  —fi 

2/i  =  —  1  +  /jl 


A  =  1 

li  =  - 1  ' 


Having  such  a  linear  system  one  solution,  the  intersection  s  Cl  r  =  P  where  the 
point  P  corresponds  to  the  value  A  =  1  in  r  or  equivalently  to  the  value  fi  =  —  1  in  s. 
Then  r  P  s  =  ( 2,  —2). 

Exercise  14.5.8  As  in  the  previous  exercise,  we  study  the  intersection  of  the  lines 


r  :  (v,  y)  =  (1,  1)  +  A(-l,  2),  j  :  (jc,  y)  =  (1,  2)  +  M 1,  -2). 


We  proceed  as  above,  and  consider  the  linear  system 

f  1  —  A  =  1  +/i  \—X  =  f!  x  =  fi 

{  1  +  2A  =  2  -  2 II  \  1  -  2/1  =  2  -  2/1  11=2  ' 

Since  this  linear  system  is  not  solvable,  we  conclude  that  r  does  not  intersect  s, 
and  since  the  direction  of  r  and  s  coincide,  we  have  that  r  is  parallel  to  s. 

14.5.9  Intersection  of  two  planes  in  A3 

Consider  the  planes  7r  and  tt'  in  A3  with  cartesian  equations  given  by 

:  ax  +  by  +  cz  +  d  =  0,  :  a'x  +  b'y  +  c'z  +  d'  =  0. 


Their  intersection  is  given  by  the  solutions  of  the  linear  system 


X 


irCnt'  • 


◄ 


ax  +  by  +  cz  +  d  =  0 
a'x  +  b'y  +  c'z  +  d'  =  0 


262 


14  Affine  Linear  Geometry 


which  is  characterized  by  the  matrices 


A  = 


a  b  c 


a'  V  c'  19 


(A,  B)  = 


a  b  c  —d 
a'  b '  c'  —d' 


We  have  the  following  possible  cases. 


rk(A) 

rk((A,  B)) 

^VriTr' 

tt  n  tt' 

1 

1 

tT 

OOz 

TT  —  TT’ 

2 

2 

OO1 

line 

1 

2 

0 

0 

Notice  that  the  case  tt  fl  tt'  =  0  corresponds  to  i r  parallel  to  i r'. 


The  following  corollary  parallels  the  one  in  Corollary  14.5.2. 

Corollary  14.5.10  Consider  two  planes  tt  and  tt'  in  A3  having  cartesian  equations 
X^  :  ax  +  by  +  cz  +  d  =  0  and  5V  :  a'x  +  b'y  +  c'z  +  d'  =  0.  One  has 


TT 


tt'  4=^ 


rk 


a  b  c  —d 
a'  b'  c'  -d' 


Exercise  14.5.11  We  consider  the  planes  tt  and  tt'  in  A3  whose  cartesian  equations 
are 


;  x  —  y  +  3zH-2  —  0  5V  !  x  —  y  z  1  —  0. 
The  intersection  is  given  by  the  solutions  of  the  system 


|  v  -  y  +  3z  =  -2 
[x  -  y  +  z  =  -l 


By  reducing  the  complete  matrix  of  such  a  linear  system, 


(A,  B) 


(\  -1  3  — 2\  /I  -1  3  -2\ 

f  1  —11—1/  ^  yO  0  2  —lj  ’ 


we  see  that  rk(A,  B )  =  rk(A)  =  2,  so  the  linear  system  has  oo1  solutions.  The  inter¬ 
section  tt  fl  tt'  is  therefore  a  line  with  cartesian  equation  given  by  X^nvr  - 

Exercise  14.5.12  We  consider  the  planes  tt  and  tt'  in  A3  given  by 

x  —  y  +  z  +  2  =  0  X^/  :  2x  —  2y  +  2z  +  1  =  0. 
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As  in  the  previous  exercise,  we  reduce  the  complete  matrix  of  the  linear  system 

^ ttPitt '  ? 

,4  m  _  (l  -1  1  “2\  /I  -1  1  —  2\ 

(4,B)  [2-22-1)  ^  (0003)’ 

to  get  rk(A)  =  1  while  rk(A,  B)  =  2,  so  7r  Fi  tt'  =  0.  Since  these  planes  are  in  A3, 
they  are  parallel. 

Exercise  14.5.13  We  consider  the  planes  tt,  tt',  tt "  in  A3  whose  cartesian  equations 
are  given  by 


E^  :  x  —  2y— z+l=0 

5V  :  x  y  —  2  =  0 

E^//  :  2v  —  4y  —  2z  —  5  =  0  . 


For  the  mutual  positions  of  the  pairs  tt,  tt'  and  tt,  tt" ,  we  start  by  considering  the 
linear  system 

x  —  2y  —  z  =  —l 

x  +  y  =  2 


;7rn7r' 


For  the  complete  matrix 


(A,  B)  = 


1  -2  -1  -1 
110  2 


we  easily  see  that  rk(A)  =  rk(A,  B)  =  2,  so  the  intersection  tt  Fi  tt'  is  the  line  whose 
cartesian  equation  is  the  linear  system  E^n^ . 

For  the  intersections  of  tt  with  tt"  we  consider  the  linear  system 


\x  —  2y  -  z  =  —  1 
|  2x  —  4y  —  2z  =  5 


The  complete  matrix 


(A,  B) 


(\  -2  1 
\2  -4  -2 


9 


has  rk(A)  =  1  and  rk(A,  B)  =  2.  This  means  that  E^ny/  has  no  solutions,  that  is 
the  planes  tt  and  tt'  are  parallel,  having  the  same  direction  given  by  the  vector  space 
solutions  of  Sno  :  x  —  2y  —  z  =  0. 
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14.5.14  Intersection  of  a  line  with  a  plane  in  A3 

We  consider  the  line  r  and  the  plane  7r  in  A3  given  by  the  cartesian  equations 


a\x  +  b\y  +  c\z  +  d\  =0 
a2x  +  b2y  +  c2z  +  d2  =  0 


:  ax  +  by  +  cz  +  d  =  0. 


Again,  their  intersection  is  given  by  the  solutions  of  the  linear  system 


)axx  +  b\y  +  ciz  =  —d\ 
a2x  +  b2y  +  c2z  =  -d2  , 
ax  +  by  +  cz  =  —d 


with  its  associated  matrices 


(ai  bi  cA 

a2  b2  c2  , 
a  b  c  J 


(a\  b\  c\  —d\ 
a2  b2  c2  —d2 
a  b  c  —d 


Since  the  upper  two  row  vectors  of  both  A  and  (A,  B)  matrices  are  linearly 
independent,  because  the  corresponding  equations  represent  a  line  in  A3,  only  the 
following  cases  are  possible. 


rk(A) 

rk((A,  B)) 

Hr 

7T  n  r 

2 

2 

oo1 

r 

3 

3 

oou 

point 

2 

3 

0 

0 

Notice  that,  when  rk(A)  =  rk(A,  B)  =  2,  it  is  r  C  7 r,  while,  if  rk(A)  =  2  and 
rk(A,  B)  =  3,  then  r  is  parallel  to  7r.  Indeed,  when  rk(A)  =  2,  then  Sr  C  Sn ,  the 
direction  of  r  is  a  subspace  in  the  direction  of  it.  In  order  to  show  this,  we  consider 
the  linear  systems  for  the  directions  Sr  and  S 


a\x  +  b\y  +  c\z  =  0 
a2x  +  b2y  +  c2z  =  0 


yE7To  :  ax  +  by  +  cz  =  0. 


Since  rk(A)  =  2  and  the  upper  two  row  vectors  are  linearly  independent,  we  can 
write 


(a,  b,  c)  =  Xi(ai,bi,ci)  +  X2(a2,  b2,  c2). 


If  P  =  Ao,  Jo-^o)  is  a  point  in  Sr ,  then  atx o  +  Ayo  +  ciZo  =  0  for  i  =  1,2.  We 
can  then  write 

ax  o  +  byo  +  czo  =  (Ai^i  +  A2^2)-^o  +  (AAi  +  X2b2)yo  +  (AiCi  +  X2c2)zo 

=  Ai(fli*o  +  b{y0  +  ciZo)  +  A2(a2^o  +  d2y0  +  c2zo) 

=  0 


and  this  proves  that  P  e  Sn,  that  is  the  inclusion  Sr  c  Sn. 
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Exercise  14.5.15  Given  in  A3  the  line  r  and  the  plane  7r  with  cartesian  equations 


x-2y-z  +  l=  0 
x  +  y  —  2  =  0 


X^  :  2x  y  —  2z  —  5  =  0, 


their  intersection  is  given  by  the  solutions  of  the  linear  system  X^nr  :  AX  =  B  whose 
associated  complete  matrix,  suitably  reduced,  reads 


1  -2  -1  -1 
(A,  B)  =  |  1  1  0  2 

2  1-25 


i-^ 


(A',  Br). 


Then  rk(A)  =  3  and  rk(A,  B)  =  3,  so  the  linear  system  X^nr  has  a  unique  solu¬ 
tion,  which  corresponds  to  the  unique  point  P  of  intersection  between  r  and  it.  The 
coordinates  of  P  are  easily  computed  to  be  P  =  (|,  —  |). 

Exercise  14.5.16  We  consider  in  A3  the  line  r  and  the  plane  7r/2  with  equations 


x  —  2y  —  z+l=0 
x  T  y  —  2  =  0 


X^  :  2x  +  hy  —  2z  —  5  =  0, 


where  h  is  a  real  parameter.  The  complete  matrix  of  to  the  linear  system 
X^nr  :  AX  =  B  giving  the  intersection  of  7r/2  and  r  is 

/i  —2  —1  — 1\ 

(Ah,B)  =11  0  2  . 

\2  h  -25/ 

We  notice  that  the  rank  of  A/2  is  at  least  2,  with  rk(A/2)  =  3  if  and  only  if 
det(A/2)  7^  0.  It  is  det (Ah)  =  —h  —  4,  so  rk (Ah)  =  3  if  and  only  if  h  ^  —4.  In  such 
a  case  rk(A/2)  =  3  =  rk(A/2 ,  B ),  and  this  means  that  r  and  7r/27^_4  have  a  unique  point 
of  intersection. 

If  h  =  —4,  then  rk(A_4)  =  2:  the  reduction 

/I  -2  -1  -1\  /I  -2  -1  -1\ 

(A_4, 5)  =111  0  2  1102 

\2-4-2  5/  \0  0  0  7  / 

shows  that  rk(A_4,  5)  =  3,  so  the  linear  system  A_4X  =  5  has  no  solutions,  and  r 
is  parallel  to  it. 
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Exercise  14.5.17  As  in  the  Exercise  14.5.15  we  study  the  intersection  of  a  plane  7r 
(represented  by  a  cartesian  equation)  and  a  line  r  in  A3  (represented  by  a  parametric 
equation).  Consider  for  instance, 

r  :  (x,  y,  z)  =  (3,  -l,  5)  +  A(l,  -1,  2),  :  x  +  y  -  z  +  1  =  0. 

As  before,  the  intersection  it  Pr  corresponds  to  the  values  of  the  parameter  A 
for  which  the  coordinates  P  =  (3  +  A,  —  1  —  A,  5  +  2A)  of  a  point  in  r  solve  the 
cartesian  equation  for  i r,  that  is 

(3  +  A)  +  (— 1  —  A)  —  (5  +  2A)  +  1  =  0  =>  — 2A  —  2  =  0  =►  A  = -1. 

We  have  then  r  Pitt  =  (2,  0,  3). 

14.5.18  Intersection  of  two  lines  in  A3 

We  consider  a  line  r  and  a  line  r'  in  A3  with  cartesian  equations 

2  .  a\x  +  b\y  +  c\z  +  d\  =  0  ,m  j  a[x  +  b\y  +  c\z  +  d[  =  0 

r  '  a2x  +  b2y  +  c2z  +  d2  =  0  r  *  {  a'2x  +  b'2y  +  c2z  +  d2  =  0  * 

The  intersection  is  given  by  the  linear  system  Errv  whose  associated  matrices  are 


(a\  b{  cA 
«2  b2  C2 
a[  b[  c[ 

\a2  b2  c2/ 


C A,B ) 


A  |  Cl  — c/ 1  \ 

^2  ^2  <^2  —  ^2 

aj  c\  —d[ 

\a2  b2  c2  —d2J 


Once  again,  different  possibilities  depending  on  the  mutual  ranks  of  these.  As 
we  stressed  in  the  previous  case  14.5.14,  since  r  and  r!  are  lines,  the  upper  two  row 
vectors  R\  and  R2  of  both  A  and  (A,  B)  are  linearly  independent,  as  are  the  last  two 
row  vectors,  R2  and  R4.  Then, 


rk  (A) 

rk((A,  B)) 

r  Dr' 

2 

2 

oo1 

r 

3 

3 

oou 

point 

2 

3 

0 

0 

3 

4 

0 

0 

In  the  first  case,  with  rk(A)  =  rk(A,  B)  =  2,  the  lines  r,  r ’  coincide,  while  in  the 
second  case,  with  rk(A)  =  rk(A,  B)  =  3,  they  have  a  unique  point  of  intersection, 
whose  coordinates  are  given  by  the  solution  of  the  system  AX  =  B. 
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In  the  third  and  the  fourth  case,  the  condition  rk(A)  7^  rk(A,  B)  means  that  the 
two  lines  do  not  intersect.  If  rk(A)  =  2,  then  the  row  vectors  R2  and  R4  of  A  are  both 
linearly  dependent  of  R±  and  R2,  and  therefore  the  homogeneous  linear  systems 

a[x  +  b\y  +  c\z  =  0 
a2x  +  b2y  +  c2z  =  0 


*ro 


a\x  +  b\y  +  c\z  =  0 
a2x  +  b2y  +  c2z  =  0 


o 


are  equivalent.  We  have  then  that  Sr  =  Sr /,  the  direction  of  r  coincide  with  that  of 
r\  that  is  r  is  parallel  to  r' .  If  rk(A)  =  3  (the  fourth  case  in  the  table  above)  the  lines 
are  not  parallel  and  do  not  intersect,  so  they  are  skew. 

Exercise  14.5.19  We  consider  the  line  r  and  r'  in  A3  whose  cartesian  equations  are 


v  —  y  +  2z  +  l=  0 
v  +  z  -  1  =  0 


y  -  z  +  2  =  0 
v  +  y  +  z  =  0 


We  reduce  the  complete  matrix  associated  to  the  linear  system  Srnr/ ,  that  is 


(A,  B) 


-1 

2 

-6 

(1 

-1 

2 

1 

0 

1 

1 

1-^ 

0 

1 

-1 

2 

0 

1 

-1 

-2 

0 

1 

-1 

-2 

V 

1 

1 

\o 

2 

-1 

1/ 

(1 

-1 

2 

1-^ 

0 

1 

-1 

2 

0 

0 

0 

-4 

\0  0  1-3/ 


(A',  B'). 


Since  rk(Ar)  =  3  and  rk(A;,  B')  =  4,  the  two  lines  are  skew. 
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15.1  Euclidean  Affine  Spaces 

In  the  previous  chapter  we  have  dealt  with  the  (real  and  linear)  affine  space  A77  as 
modelled  on  the  vector  space  M77 .  In  this  chapter  we  study  the  additional  structures 
on  A77  that  come  when  passing  from  M77  to  the  euclidean  space  En  (see  the  Chap.  3). 
Taking  into  account  the  scalar  product  allows  one  to  introduce  metric  notions  (such 
as  distances  and  angles)  into  an  affine  space. 

Definition  15.1.1  The  affine  space  A77  associated  to  the  Euclidean  vector  space 
En  =  (M77 ,  •)  is  called  the  Euclidean  affine  space  and  denoted  E77 .  A  reference  system 
(0,3)  for  E77  is  called  cartesian  orthogonal  if  the  basis  B  for  En  is  orthonormal. 

Recall  that,  if  3  is  an  orthonormal  basis  for  En ,  the  matrix  of  change  of  basis 
Ms,b  (the  matrix  whose  column  vectors  are  the  components  of  the  vectors  in  3 
with  respect  to  the  canonical  basis  £)  is  orthogonal  by  definition  (see  the  Chap.  10, 
Definition  10.1.1),  and  thus  det (M£,B)  =  ±1. 

In  our  analysis  in  this  chapter  we  shall  always  consider  cartesian  orthogonal 
reference  systems. 

Exercise  15.1.2  Let  r  be  the  (straight)  line  in  E2  with  vector  equation 

(x,  y)  =  (1,  _2)  +  A(l,  —1). 

We  take  A  =  (1,  —2)  and  v  =  (1,  —1).  To  determine  a  cartesian  equation  for  r, 
in  alternative  to  the  procedure  described  at  length  in  the  previous  chapter  (that  is 
removing  the  parameter  A),  one  observes  that,  since  C (v)  is  the  direction  of  r,  and 
thus  the  vector  u  =  (1,  1)  is  orthogonal  to  v,  we  can  write 
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P  =  (x,y)  er  <^=>- 
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P  -  Ae  £( v) 

(P  -A)-u  =  0 
(*-l,y  +  2).(l,l)=0. 


This  condition  can  be  written  as 

x  +  y  +  1  =0, 

yielding  a  cartesian  equation  £r  for  r. 

This  exercise  shows  that,  if  r  is  a  line  in  E2  whose  vector  equation  is 
r  :  P  =  A  +  An,  with  u  a  vector  orthogonal  to  v  so  that  for  the  direction  of  r  one 
has  Sr  =  C(u)L,  we  have 

Per  <<=>►  (P  —  A)  -  u  =  0. 

This  expression  is  called  the  normal  equation  for  the  line  r. 

We  can  generalise  this  example  to  any  hyperplane. 

Proposition  15.1.3  Let  H  c  E77  be  a  hyperplane,  with  A  e  H.  If  u  e  R”  w  a  non 
vector  orthogonal  to  the  direction  Sh  of  the  hyperplane,  that  is  C{u)  =  ( Sh 
then  it  holds  that 

PeH  (P-A)-u  =  0. 

Definition  15.1.4  The  equation 

Mh  :  (P  -A)  u=0 

is  called  the  normal  equation  of  the  hyperplane  H  in  E,? .  If  n  =  2,  it  yields  the  normal 
equation  of  a  line;  if  n  =  3,  it  yields  the  normal  equation  of  a  plane. 

Remark  15.1.5  Notice  that,  as  we  already  seen  for  a  cartesian  equation  in  the  previ¬ 
ous  chapter  (see  the  Remark  14.4. 11),  the  normal  equation  AfH  for  a  given  hyperplane 
in  E/7  is  not  uniquely  determined,  since  A  can  range  in  H  and  the  vector  u  is  given 
up  to  an  arbitrary  non  zero  scalar. 

Remark  15.1.6  With  a  cartesian  equation 


Yjh  :  a\X\  +  •  •  •  +  anxn  —  b 

for  the  hyperplane  for  H  in  Ew,  one  has  =  £((a\,  . . . ,  an)).  This  follows  from 
the  definition 


SH  =  {(*i,  . . . ,  xn)  e  Rn  :  a\X\  H - b  anxn  =  0} 

=  {(*!,  •  •  • ,  xn)  e  Rn  :  (au  . . . ,  an)  •  (xu  . . . ,  xn)  =  0}. 
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With  A  an  arbitrary  point  in  H,  a  normal  equation  for  H  is  indeed  given  by 

Mh  :  (P  —  A)  •  (a\,  ,  an)  =  0. 

Exercise  15.1.7  We  determine  both  a  cartesian  and  a  normal  equation  for  the  plane 
7r  in  A3  whose  direction  is  orthogonal  to  u  =  (1,  2,  3)  and  that  contains  the  point 
A  =  (1,  0,  —1).  We  have 

A4  :  (x  -  1,  +  1)  •  (1,  2,  3)  =  0, 

equivalent  to  the  cartesian  equation 

;  x  +  2y  +  3z  H-  2  =  0. 

Exercise  15.1.8  Given  the  (straight)  line  r  in  A2  with  cartesian  equation 

Xr  :  2x  —  3y  +  3  =  0 

we  look  for  its  normal  equation.  We  start  by  noticing  (see  the  Remark  15.1.6)  that  the 
direction  of  r  is  orthogonal  to  the  vector  u  =  (2,  —3),  and  that  the  point  A  =  (0,  1) 
lays  in  r,  so  we  can  write 

M:  (P  —  (0,  1))  •  (2,  —3)  =  0  ^  (x,y-  1)  •  (2, -3)  =  0 

as  a  normal  equation  for  r. 

From  what  discussed  above,  it  is  clear  that  there  exist  deep  relations  between  carte¬ 
sian  and  normal  equations  for  an  hyperplane  in  a  Euclidean  affine  space.  Moreover, 
as  we  have  discussed  in  the  previous  chapter,  a  generic  linear  affine  variety  in  An 
can  be  described  as  a  suitable  intersection  of  hyperplanes.  Therefore  it  should  come 
as  no  surprise  that  a  linear  affine  variety  can  be  described  in  a  Euclidean  affine  space 
in  terms  of  a  suitable  number  of  normal  equations.  The  general  case  is  illustrated  by 
the  following  exercise. 

Exercise  15.1.9  Let  r  be  the  line  through  the  point  A  =  (1,  2,  —3)  in  E3  which  is 
orthogonal  to  the  space  £((  1,  1,0),  (0,  1,  —1)).  Its  normal  equation  is  given  by 

r  .  {(P  — A).  (1,1,0)  =0 

(P-A)-(O,  1,-1)  =0  ’ 


that  is 


{  (*  —  1,  y  —  2,  z  +  3)  •  (1,  1,  0)  =  0 
|(x-l,y-2,z  +  3).(0,  1,-1)  =0 


yielding  then  the  cartesian  equation 
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f  x  +  y  —  3  =  0 
j  v  -  ~  -  5  =  0 


15.2  Orthogonality  Between  Linear  Affine  Varieties 

In  the  Euclidean  affine  space  E72  there  is  the  notion  of  orthogonality.  Thus,  we  have: 

Definition  15.2.1  One  says  that 

(a)  the  lines  r,  r '  C  E”  are  orthogonal  if  v  •  v'  =  0  for  any  v  e  Sr  and  any  v'  e  Sr/, 

(b)  the  planes  C  E3  are  orthogonal  if  u  •  u'  =  0  for  any  u  e  and  any 
u'  €  S£, 

(c)  the  line  r  with  direction  v  is  orthogonal  to  the  plane  7r  in  E3  if  v  e  S^r. 
Exercise  15.2.2  We  consider  the  following  lines  in  E2, 

Eri  :  2x  —  2y  +  1  =  0, 

5 jrz  •  v  +  y  +  3  =  0, 
r3:  (x,  y)  =  (1,  —3)  +  A(l,  1), 

K4  :  (JC  +  l,y  —  4)  -  (1,2)  =0 

with  directions  spanned  by  the  vectors 


vi  =  (2,  2), 

l>2  =  (1,-1), 

^3  =  (1,1), 

v4  =  (1,  -2). 

It  is  immediate  to  show  that  the  only  orthogonal  pairs  of  lines  among  them  are 
r\  _L  r2  and  r2  1  r3 . 

Exercise  15.2.3  Consider  the  lines  r,  r'  c  E3  given  by 

Ix  =  3  +  fi 

y  =  2  -  2/i  . 
z  =  3/i 

We  have  Sr  =  £(( 3,  0,  —1))  and  Srf  =  £((1,  —2,  3)).  Since 

(3,  0,-1).  (1,-2,  3)  =  0 


we  conclude  that  r  is  orthogonal  to  r'. 
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Exercise  15.2.4  Let  7r  be  the  plane  in  E3  whose  cartesian  equation  is 

:  x  —  y  +  2z  —  3  =  0. 


In  order  to  find  an  equation  for  the  line  r  through  A  =  (1,2,  1 )  which  is  orthogonal 
to  7T  we  notice  from  the  Remark  15.1.6,  that  it  is  =  £((1,  —1,2)):  we  can  then 
write 

r  :  (x,  y,  z)  =  (1,  2,  1)  +  A(l,  —1,  2). 

Exercise  15.2.5  Consider  in  E3  the  line  given  by 


v  —  2y  -i-  z  —  1=0 

x  +  y  =  0 


We  seek  to  determine: 

(1)  a  cartesian  equation  for  the  plane  n  through  the  point  A  =  (—1,  —1,  —1)  and 
orthogonal  to  r, 

(2)  the  intersection  between  r  and  7 r. 

We  proceed  as  follows. 

(1)  From  the  cartesian  equation  Er  we  have  that 

V  =  £((1,-2,  1),  (1,1,0)) 

and  this  subspace  yields  the  direction  Sn.  Since  A  e  n,  a  vector  equation  for  7 r 
is  given  by 


7T  :  (v,  y,  z )  =  -(1,  1,  1)  +  A(l,  -2,  1)  +  /i(l,  1,  0). 

By  noticing  that  Sn  =  £((  1,  —1,  —3)),  a  normal  equation  for  n  is  given  by 

K  :  (P  —  A)  ■  (1,  —1,  —3)  =  0 


yielding  the  cartesian  equation 


J7T 


x  —  y  —  3z  —  3  =  0. 


(2)  The  intersection  tt  Fi  r  is  clearly  given  by  the  unique  solution  of  the  linear 
system 

v  —  2y  +  z  —  1  =  0 
E^nr  •  {  x  +  y  =  0  , 

x  —  y  —  3z  —  3  =  0, 


which  is  P  —  yj(6,  —6,  —7). 
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Exercise  15.2.6  We  consider  again  the  lines  r  and  r'  in  E3  from  the  Exercise  15.2.3. 
We  know  that  r  _L  r' .  We  determine  the  plane  tt  which  is  orthogonal  to  r'  and  such 
that  r  C  7T.  Since  £((l,  —2,  3))  is  the  direction  of  r ',  we  can  write  from  the  Remark 
15.1.6  that 


X^  :  x  —  2y  +  3z  +  d  =  0 


with  d  a  real  parameter.  The  line  r  is  in  7r  if  and  only  if  the  coordinates  of  every  of 
its  points  P  =  (1  +  3A,2,  1  —  A)  g  r  solve  the  equation  X^,  that  is,  if  and  only  if 
the  equation 

(1  +  3A)  -  2(2)  +  3(1  -  A)  +  d  =  0 


has  a  solution  for  each  value  of  A.  This  is  true  if  and  only  if  d  =  0,  so  a  cartesian 
equation  for  tt  is 

X^  :  v  —  2y  +  3z  =  0. 

Exercise  15.2.7  For  the  planes 

X^  :  2x  +  y  —  z  —  3  =  0,  5V  :  v  +  y  +  3z  —  1  =  0 

in  E3  we  have  =  £(( 2,  1,  —1))  and  =  £((  1,  1,3)).  We  conclude  that  7r  is 

orthogonal  to  7r;,  since  (2,  1,  —1)  •  (1,  1,  3)  =  0.  Notice  that 

(2,  1,  — 1)  g  5V  —  {($,  b,  c, )  !  $  T  Z?  T  3c  —  0}, 

that  is  ^  C  SV  •  We  can  analogously  show  that  Sy  c  s„  .  This  leads  to  the  following 
remark. 

Remark  15.2.8  The  planes  7r,  tt'  C  E3  are  orthogonal  if  and  only  if  5^  C  Snf  (or 
equivalently  if  and  only  if  #  c  ST). 

In  order  to  recap  the  results  we  described  in  the  previous  pages,  we  consider  the 
following  example. 

Exercise  15.2.9  Consider  the  point  A  =  (1,0,  l)inE3  and  the  lines  r,  s  with  equa¬ 
tions 


r:  (x,y,z)  =  (1,2,  1)  +  A(3,  0, -1),  X,  : 


v-y+z+2=0 

v  —  z  +  1  =  0 


We  seek  to  determine: 

(a)  the  set  T  of  lines  through  A  which  are  orthogonal  to  r, 

(b)  the  line  /  e  T  which  is  parallel  to  the  plane  tt  given  by  X^  :  v  —  y  +  z  +  2  =  0, 

(c)  the  line  V  e  T  which  is  orthogonal  to  s, 

(d)  the  lines  q  C  tt'  with  X^/  :  y  —  2  =  0  which  are  orthogonal  to  r. 

For  these  we  proceed  as  follows. 
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(a)  A  line  u  through  A  has  a  vector  equation 

(x,  y,  z )  =  (1,  0,  1)  +  A  (a,  b,  c) 

with  arbitrary  direction  Su  =  C((a,  b,  c)).  Since  u  is  to  be  orthogonal  to  r,  we 
have  the  condition  (a,  b,  c)  •  (3,0,  1)  =  3 a  —  c  =  0.  The  set  T  is  then  given  by 
the  union  T  —  {fo}aGR  U  {r}  with 

ra:  (x,y,z)  =  (  1,  0,  1)  + /x(l,  a,  3)  for  a  ^  0, 


and 

r  :  (. x ,  y,  z)  =  (1,  0,  1)  +  0,  1,  0)  for  a  =  c  =  0. 

(b)  Since  the  direction  Sn  of  the  plane  n  is  given  by  the  subspace  orthogonal  to 

£((  1,  —1,  1)),  it  is  clear  from  (0,  1,  0)  •  (1,  —1,  1)  ^  0  that  the  line  r  is  not 
parallel  to  n.  This  means  that  the  line  /  must  be  found  within  the  set  If 

we  impose  that  (1,  a ,  3)  •  (1,  —  1,  1)  =  0,  we  have  a  =  4,  so  the  line  /  is  given 
by 

l:  (x,  y,  z)  =  (1,  0,  1)  +  n(l,  4,  3). 

(c)  A  cartesian  equation  for  s  is  given  by  solving  the  linear  system  in  terms  of 
one  free  unknown.  It  is  immediate  to  show  that 


s  :  (x,  y,  z)  =  (-1  +  77,  1  +  277, 77)  =  (-1,  1,  0)  +  77(1,  2,  1). 

The  condition  ra  _L  s  is  equivalent  to  (1,  a ,  3)  •  (1,  2,  1)  =  0,  reading  a  =  —2, 
so  we  have 

(x,  y,  z)  =  (1,  0,  1)  +  /i(l,  —2,  3). 

This  is  the  unique  solution  to  the  problem:  we  directly  inspect  that  r  is  not 
orthogonal  to  s,  since  (0,  1,  0)  •  (1,  2,  1)  =  2  ^  0. 

(d)  A  plane  7 r/2  is  orthogonal  to  r  if  and  only  if 

:  3x  -  z  +  h  =  0. 

The  lines  qn  are  then  given  by  the  intersection 


^ qh  ~  ^ 


7 Th  ri7r' 


3x  —  z  +  h  =  0 
y-2  =  0 


with  h  G  R. 
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15.3  The  Distance  Between  Linear  Affine  Varieties 

It  is  evident  that  the  distance  between  two  points  A  and  B  on  a  plane  is  defined  to 
be  the  length  of  the  line  segment  whose  endpoints  are  A  and  B.  This  definition  can 
be  consistently  formulated  in  a  Euclidean  affine  space. 

Definition  15.3.1  Let  A  and  B  be  a  pair  of  points  in  E”.  The  distance  d(A,  B) 
between  them  is  defined  as 


d (A,  B )  =  ||fi  -  A||  =  7(5  -  A)  (B  -  A). 

Exercise  15.3.2  If  A  =  (1,  2, 0,  —1)  and  B  =  (0,  —1,  2,  2)  are  points  in  E4,  then 

d(A,  B)  =  ||(— 1,  —3,  2,  3)||  =723. 

The  well  known  properties  of  a  Euclidean  distance  function  are  a  consequence  of 
the  corresponding  properties  of  the  scalar  product. 

Proposition  15.3.3  For  any  A,  B,  C  points  in  W  the  following  properties  hold. 

(1)  d(A,  B)  >  0, 

(2)  d  (A,  B)  =  0  if  and  only  if  A  =  B, 

(3)  d (A,  B)  =  d(B,  A). 

(4)  d(A,  B)  +  d(B,  C)  >  d(A,  C). 

In  order  to  introduce  a  notion  of  distance  between  a  point  and  a  linear  affine 
variety,  we  start  by  looking  at  an  example.  Let  us  consider  in  E2  the  point  A  =  (0,0) 
and  the  line  r  whose  vector  equation  is  (x,  y)  =  (1,  1)  +  A(l,  —1).  By  denoting 
Pa  =  (1  +  A,1  —  A)  a  generic  point  in  r,  we  compute 

d(A,  Px)  =  72  +  2A2. 

It  is  immediate  to  verify  that,  as  a  function  of  A,  the  quantity  d(A,  P)  ranges 
between  \fl  and  +oo:  it  is  therefore  natural  to  consider  the  minimum  of  this  range 
as  the  distance  between  A  and  r.  We  have  then  d(A,  r)  =  \fl. 

Definition  15.3.4  If  L  is  a  linear  affine  variety  and  A  is  a  point  in  E" ,  the  distance 
d (A,  L)  between  A  and  L  is  defined  to  be 

d(A,  L)  =  min{d(A,  B)  :  B  e  L}. 

Remark  15.3.5  It  is  evident  from  the  definition  above  that  d(A,  L)  =  0  if  and  only 
if  A  e  L.  We  shall  indeed  prove  that,  given  a  point  A  and  a  linear  affine  variety  L  in 
E”,  there  always  exists  a  point  A0  e  L  such  that  d(A,  L)  =  d(A0,  L),  thus  showing 
that  the  previous  definition  is  well  posed. 
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Proposition  15.3.6  Let  L  be  a  linear  affine  variety  and  A  £  L  a  point  in  E”.  It  holds 
that 

d (A,  L)  =  d(A,  A0)  with  A0  =  L  H  (A  +  S^). 

Here  the  set  A  +  Sjy  denotes  the  linear  affine  variety  through  A  whose  direction  is 
S^.  The  point  Ao  is  called  the  orthogonal  projection  of  A  on  L. 

Proof  Since  the  linear  system  XLn(A+iS±)  given  by  the  cartesian  equations  XL  and 

X A+sf  is  °f  rank  n  with  n  unknowns,  the  intersection  L  n  (A  +  S^)  consists  of  a 
single  point  that  we  denote  by  A0. 

Let  B  be  an  arbitrary  point  in  L.  We  can  decompose 

A  —  B  =  (A  —  A0)  +  (A0  —  B), 

with  A0  —  B  e  SL  (since  both  A0  and  B  are  in  L)  and  A  —  A0  e  (since  both  A 
and  Ao  are  points  in  the  linear  affine  variety  A  +  S^).  We  have  then 

(A  -  A0)  •  (A0  -  B)  =  0 


and  we  write 


(d(A,  fi ))2  =  || A  -  fi||2  =  || (A  -  A0)  +  (A0  -  fi)||2 

=  ||  A  —  A0||2  +  ||  A0  —  fi||2 


As  a  consequence, 


(d(A,fi))2>  ||A-A0||2  =  (d(A,A0))2 

for  any  B  e  L,  and  this  proves  the  claim.  □ 

Exercise  15.3.7  Let  us  compute  the  distance  between  the  line  r:  2x  +  y+  4  =  0 
and  the  point  A  =  (1,  —  l)inE2.  We  start  by  finding  the  line  sa  =  A  +  S^~  through 
A  which  is  orthogonal  to  r.  The  direction  Sj~  is  spanned  by  the  vector  (2,  1),  so  we 
have 

$a  :  (v,  y)  =  (1,  —1)  +  A(2,  1). 

The  intersection  A0  =  r  Li  sa  is  then  given  by  the  value  of  the  parameter  A  that 
solves  the  equation 

2(1  +  2A)  +  (—1  +  A)  +  4  =  0, 
that  is  A  =  —  1  giving  Ao  =  (—1,  —2).  Therefore  we  have 


d(A,r)  =  d(A,A0)  =  ||(2,  1)||  =  vT 
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Exercise  15.3.8  Let  us  consider  in  E3  the  point  A  =  (1,  —1,0)  and  the  line  r  with 
vector  equation  r  :  (x,  y,  z)  =  (1,2,  1)  +  A(l,  —  1,  2).  In  order  to  compute  the  dis¬ 
tance  between  A  and  r  we  first  determine  the  plane  it  a  :=  A  +  S^r .  Since  the  direction 
of  r  must  be  orthogonal  to  it  a,  from  the  Remark  15.1.6  the  cartesian  equation  for  it  a 
is  given  by 

£tta  :  v-y  +  2z  +  (i  =  0, 

with  d  e  R.  The  value  of  d  if  fixed  by  asking  that  A  e  tt  a,  that  is  1  +  1  +  d  =  0 
giving  d  =  —  2.  We  then  have 

yE7TA  :  v  —  y  -j-2z  -  2  =  0. 

The  point  A0  is  now  the  intersection  r  H  tt  a,  which  is  given  for  the  value  of  A  =  2 
which  solves, 

(1  +  A)  -  (2  -  A)  +  2(1  +  2A)  -  2  =  0. 

It  is  therefore  Aq  =  (|,  y,  |),  with 


1  17  4  /59 

d(A,r)  =  d(A,A0)  =  ||(-,— ,-)||  =y-. 

The  next  theorem  yields  a  formula  which  allows  one  to  compute  more  directly 
the  distance  d(Q,  H)  between  a  point  Q  and  an  hyperplane  H  in  E77. 

Theorem  15.3.9  Let  H  be  a  hyperplane  and  Q  a  point  in  E77  with 
E//  :  a\X\  +  •  •  •  +  anxn  +  b  =  0  and  Q  =  (x[,  . . . ,  x'n).  The  distance  between  Q 
and  H  is  given  by 

I  a  i  x  i  T  •  •  •  T  an  x[  T  b  I 
d (Q,  H)  =  11  - -. 

y  a\  H - ^ 

Proof  If  we  consider  X  =  (x\ ,  . . . ,  xn)  and  A  =  (a\ ,  . . . ,  an)  as  vectors  in  M77,  using 
the  scalar  product  in  E77,  the  cartesian  equation  for  H  can  be  written  as 

E//  !  A  •  X  T  b  =  0. 

We  know  that  A  so  the  line  through  A  which  is  orthogonal  to  //  is  made 

of  the  points  P  such  that 

r  :  P  =  g  +  AA. 

The  intersection  point  g0  =  r  H  H  is  given  by  replacing  X  in  with  such  a  P, 
that  is 
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A  •  (Q  +  AA)  +  b  =  0 


A-Q  +  \A-A  +  b  =  0 
A=  A-Q  +  b 
A  A 


The  equation  for  r  gives  then 


Qo  =  Q  ~ 


A-Q  +  b 


A 


We  can  now  easily  compute 


Q  -  Golf  = 


A-Q  +  b 


A 


\A-Q+b\ 


\A-Q+b\ 


therefore  getting 


d  (Q,H)  = 


\A  ■  Q  +  b\  \aix[  +  •  •  •  +  anx'n  +  b\ 


cii  T  ■  *  *  T  ci 


n 


as  claimed. 


□ 


Exercise  15.3.10 

and  the  point  A  = 
15.3.9  we  have 


Consider  the  line  r  with  cartesian  equation  Xr  :  2x  +  y  +  4  =  0 
(1,  —1)  in  E2  as  in  the  Exercise  15.3.7  above.  From  the  Theorem 


d(A,  r) 


12  -_L^l41  =V5 

n/4TT 


Exercise  15.3.11  By  making  again  use  of  the  Theorem  15.3.9  it  is  easy  to  com¬ 
pute  the  distance  between  the  point  A  =  (1,2,  —1)  and  the  plane  7r  in  E3  with 
:  v  +  2y  —  2z  +  3  =  0.  We  have 


d(A,  7 r) 


l+4  +  2  +  3| 
Vl  +4  +  4 


10 

y 


We  generalise  the  analysis  above  with  a  natural  definition  for  the  distance  between 
any  two  linear  affine  varieties. 

Definition  15.3.12  Let  L  and  L'  two  linear  affine  varieties  in  E+  The  distance 
between  them  is  defined  as  the  non  negative  real  number 


d(L,  L')  =  min{d(A,  A')  :  A  e  L,  A;  e  L'}. 


It  is  evident  that  d(L,  Lf)  =  0  if  and  only  if  L  Pi  Lr  ^  0.  It  is  indeed  possible  to 
show  that  the  previous  definition  is  consistent  even  when  L  n  L'  =  0.  Moreover  one 
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can  show  that  there  exist  a  point  A  e  L  and  a  point  A'  e  L\  such  that  the  minimum 
distance  is  attained  for  them,  that  is  d(A,  A')  <  d(A,  A')  for  any  A  e  L  and  A'  e  L 
For  such  a  pair  of  points  it  is  d(L,  L')  =  d(A,  A7). 

In  the  following  pages  we  shall  study  the  following  cases  of  linear  varieties  which 
do  not  intersect: 

•  lines  r,  d  in  E2  which  are  parallel, 

•  planes  n,  n'  in  E3  which  are  parallel, 

•  a  plane  7 r  and  a  line  r  in  E3  which  are  parallel, 

•  lines  r,  d  in  E3  which  are  parallel. 

Remark  15.3.13  Consider  lines  r  and  r'  in  E2  which  are  parallel  and  distinct.  Their 
cartesian  equations  are 


Xr  :  ax  +  by  +  c  =  0, 


:  ax  +  by  +  c'  —  0, 


fore7  d  c.LetA  =  (x[,xf2)  e  r,  that  is  ax  [  +  bx'2 
it  is 


d(A,  r) 


ax[  +  bx  2  +  c’ 
\J  a1  +  b2 


+  c  =  0.  From  the  Theorem  15.3.9 


+J  a1  +  b2 


From  the  Definition  15.3.12  we  haved(A,  A ')  >  d(A,  d).  Since  the  value  d(  A,  d) 
we  have  computed  does  not  depend  on  the  coordinates  of  A  e  r ,  we  have  that  d(A ,  r') 
is  the  minimum  value  for  d(A,  A')  when  A  ranges  in  r  and  A!  in  r',  so  we  conclude 


that 


d(r,  d) 


\/  a1  +  b2 


Notice  that,  with  respect  to  the  same  lines,  we  also  have 


d(A/,r)  = 


\c  —  c 


\/  a1  +  b2 


Exercise  15.3.14  Consider  the  parallel  lines  r,  d 

Sr  :  2x  +  y  —  3  =  0,  5V  : 


The  distance  between  them  is 


d(r,  d)  = 


|2  —  (—3)| 

A 


d(A,r'). 

C  E2  with  cartesian  equations 
2x  +  y  +  2  =  0. 

=  V5. 


The  distance  between  two  parallel  hyperplanes  in  E77  is  given  by  generalising  the 
proof  of  the  Theorem  15.3.9. 

Proposition  15.3.15  If  H  and  H '  are  parallel  hyperplanes  in  E 77  with  cartesian 
equations 
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X //  !  ci\X\  +  •  •  •  +  anxn  b  —  0,  Yjjj’  !  ci\X\  +  •  •  •  +  anxn  -\~  b  —  0, 


then  d (//,  H')  =  d (Q,  H'),  where  Q  is  an  arbitrary  point  in  H,  and  therefore  it  is 


d(H,  Hf)  = 


|  b-b'\ 


a\  + 


+  a 


n 


Proof  We  proceed  as  in  the  Theorem  15.3.9,  so  we  set  X  =  (x\, 
A  =  (ai,  . . . ,  an)  and  write 


,xn)  and 


X//  !  A  •  X  T  b  —  0,  X//'  !  A  -  X  T  bf  —  0. 

As  we  argued  in  the  Remark  15.3.13,  by  setting  Q  =  X  with  AX  +  b  =  0,  as  an 
arbitrary  point  in  H,  we  have 


d (G,  Hr )  = 


\A  •  X  +  br\  \b'-b\ 


and  since  such  a  distance  does  not  depend  on  Q ,  we  conclude  that 
d(H,H')  =  d(Q,H').  □ 

Exercise  15.3.16  The  planes 


:  v  +  2y  —  z  +  2  =  0,  X^/  :  x  +  2y  —  z  —  4  =  0 


are  parallel  and  distinct.  The  distance  between  them  is 


d(7T,  n) 


12  +  4| 
VI  +4+  1 


It  is  clear  that  not  all  linear  affine  varieties  which  are  parallel  have  the  same 
dimension.  The  next  proposition  shows  a  result  within  this  situation. 


Proposition  15.3.17  Let  r  be  a  line  and  H  an  hyperplane  in  E+  with  r  parallel  to 
H.  It  is 


d(r,  H)  =  d(P,H), 


where  P  is  any  point  in  r. 

Proof  With  the  notations  previously  adopted,  we  have  A  =  (a\,  . . . ,  an)  and 
X  =  (x\,  . . . ,  xn),  we  represent  H  by  the  cartesian  equation 

Yh  :  A-X  +  b  =  0 


and  r  by  the  vector  equation 
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r  :  P  =  P  -\-  \v 


where  Per  while  v  •  A  =  0  since  r  is  parallel  to  H .  From  the  Theorem  15.3.9  we 
have 


d(P,  H) 


\A-P+b\ 

PI 


\A-(P  +  \v)+b\ 

1|A|| 


\A-P  +  b\ 

PI 


This  expression  does  not  depend  on  A:  this  is  the  reason  why  d(P,  H)  =  d (P,  H)  = 
d(r,  H).  □ 

Exercise  15.3.18  Consider  in  E3  the  line  r  and  the  plane  7r  given  by: 


J2v  —  y+z  —  2  =  0 

{  y  +  2z  =  0 


:  2x  —  y  +  z  +  3  =  0. 


Since  r  is  parallel  to  tt,  we  take  the  point  P  =  (1,  0,  0)  in  r  and  compute  the 
distance  between  P  and  tt.  One  gets 


d(r,  7 r)  =  d (P,  tt)  = 


5 


Exercise  15.3.19  Consider  the  lines  r  and  r'  in  E3  given  by  the  vector  equations 


r  :  (A,  y,  z)  —  (3,  1,  2)  +  A(l,  2,  0), 


/:  (x,  y,  z)  =  (—1,  —2,  3)  +  A(l,  2,  0). 


Since  r  is  parallel  to  r\  the  distance  between  them  can  be  computed  by  proceeding 
as  in  the  previous  exercises,  that  is  d(r,  rr)  =  d(A,  r')  =  d (B,  r),  where  A  is  an 
arbitrary  point  in  r  and  B  an  arbitrary  point  in  r' . 

We  illustrate  an  alternative  method.  We  notice  that,  if  7r  is  a  plane  orthogo¬ 
nal  to  both  r  and  r\  then  the  distance  d(r,  rr)  =  d(P,  Pf)  where  P  =  tt  Fi  r  and 
Pr  =  7T  D  r' .  We  consider  the  plane  tt  through  the  origin  which  is  orthogonal  to  both 
r  and  r' ,  and  whose  cartesian  equation  is 


'7T 


x  T  2y  =  0. 


Direct  calculations  show  that  P  =  7rnr  =  (2,— 1,2)  and  P'  =  7r  Gi  r' 
(0,  0,  3),  so 

d(r,  r')  =  d(P,  P')  =  76. 


We  end  the  section  by  sketching  how  to  define  the  distance  between  skew  lines 
in  E3. 

Remark  15.3.20  If  r  and  rf  are  skew  lines  in  E3,  then  there  exist  a  point  Per  and 
a  point  Per'  which  are  the  intersections  of  the  lines  r  and  r'  with  the  unique  line  s 
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orthogonally  intersecting  both  r  and  r' .  The  line  s  is  the  minimum  distance  line  for 
r  and  rf ,  and  the  distance  d(r,  r')  =  d (P,  P'). 

Exercise  15.3.21  We  consider  the  skew  lines  in  E3, 

r  :  (x,  y,  z)  =  A(l,  -1,  1),  r'  :  (v,  y,  z)  =  (0,  0,  1)  +  1,  0,  1). 

The  subspace  N  C  E3  which  is  orthogonal  to  both  the  directions  Sr  and  Sr?  is 
N  =  £((1,  0,  —1)).  The  minimum  distance  line  s  for  the  given  r  and  r'  has  the 
direction  Ss  =  N,  and  intersects  r  in  a  point  P  and  r'  in  a  point  P'.  Since  Per 
and  P  e  r',  there  exists  a  value  for  A  and  a  value  for  /x  such  that  P  =  Q( A)  and 
=  Q\p)  with 

G(A)  +  f(l,  0,  -1)  =  Q'i/d), 

where  v  is  the  parameter  for  s.  The  points  P  —  s  Hr  and  P'  =  s  Or'  are  then  those 
corresponding  to  the  values  of  the  parameters  A  and  fi  solving  such  a  relation,  that  is 


IA  +  v  =  /i 
—A  =  0 
A  —  t  —  1  T  /a 

One  finds  A  =  0,  fi  =  v  =  —  so  P  —  (0,  0,  0),  P'  =  |(— 1,  0,  1)  and 

d(r,r')  =  d(P,P')  =  -^=. 

V2 


15.4  Bundles  of  Lines  and  of  Planes 

A  useful  notion  for  several  kinds  of  problems  in  affine  geometry  is  that  of  bundle  of 
lines  and  bundle  of  planes. 

Definition  15.4.1  Given  a  point  A  in  E2,  the  bundle  of  concurrent  lines  with  center 
(or  point  of  concurrency)  A  is  the  set  of  all  the  lines  through  A  in  E2;  we  shall  denote 
it  by  T a. 

The  next  result  is  immediate. 

Proposition  15.4.2  With  A  =  (xo,  yo)  €  E2,  the  cartesian  equation  of  an  arbitrary 
line  in  the  bundle  through  A  is  given  by 

:  ol(x  -  v0)  +  (3(y  -  yo)  =  0 

for  any  choice  of  the  real  parameters  a  and  /3  such  that  (a,  (3)  e  M2  \  {(0,  0)}. 
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Notice  that  the  parameters  a  and  (3  label  a  line  in  Ta,  but  there  is  not  a  bijection 
between  pairs  (a,  (3)  and  lines  in  Ta-  the  pairs  (cto,  (3f)  and  (pa o,  pfio),  f°r  P  7^  0, 
give  the  same  line  in  Ta  • 

Exercise  15.4.3  The  cartesian  equation  for  the  bundle  Ta  of  lines  through 
A  =  (1,  —2)  in  E2  is 


Y.Ta  :  a(*  -  1)  +  /?(y  +  2)  =  0. 

The  result  described  in  the  next  proposition  (whose  proof  we  omit)  shows  that  the 
bundle  T a  can  be  generated  by  any  pair  of  distinct  lines  concurrent  in  A. 

Proposition  15.4.4  Let  A  e  E2  be  the  unique  intersection  of  the  lines 

Xr  :  ax  +  hy  +  c  =  0,  Er/  :  a'x  +  b'y  +  c'  =  0. 

Any  relation 

•  a(ax  +  by  +  c)  +  (3 (a'x  +  b'y  +  c')  =  0 

withM 2  3  (a,  (3)  7^  (0,0)  is  the  cartesian  equation for  aline  in  the  bundle  T a  of lines 
with  center  A,  and  for  any  element  s  of  T ’ a  there  exists  a  pair  M2  9  (a,  (3)  7^  (0,  0) 
such  that  the  cartesian  equation  of  s  can  be  written  as 

^(a,/3)  :  a(ax  +  by  +  c)  +  (3(ar x  +  b'y  +  c')  =  0.  (15.1) 

Definition  15.4.5  If  the  bundle  Ta  is  given  by  (15.1),  the  distinct  lines  r,  r'  are 
called  the  generators  of  the  bundle.  To  stress  the  role  of  the  generating  lines,  we  also 
write  in  such  a  case  Ta  =  T(r ,  r'). 

Exercise  15.4.6  The  line  r  whose  cartesian  equation  is  Er  :  x  +  y  +  1  is  an  ele¬ 
ment  in  the  bundle  Ta  in  the  Exercise  15.4.3,  corresponding  to  the  parameters 
(a,  (3)  =  (1,  1)  or  equivalently  ( a ,  (3)  =  ( p ,  p)  with  p  7^  0. 

Exercise  15.4.7  Consider  the  following  cartesian  equation, 

:  a(x  ~  y  +  5)  +  f3( 2x  +  y  +  3)  =  0, 

depending  on  a  pair  of  real  parameters  (a,  /?)  7^  (0, 0).  Since  the  relations 
x  —  y  T  3  =  0  and  2x  +  y  +  3  =  0  yield  the  cartesian  equations  for  a  pair  of  non 
parallel  lines  in  E2,  the  equation  is  the  cartesian  equation  for  a  bundle  T  of 

lines  in  E2.  We  compute: 

(a)  the  centre  A  of  the  bundle  T , 

(b)  the  line  s\  e  T  which  is  orthogonal  to  the  line  n  whose  cartesian  equation  is 
Eri  :  3x  +  y  —  1  =  0, 
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(c)  the  line  S2  e  T  which  is  parallel  to  the  line  r 2  whose  cartesian  equation  is 

£r2  :  v  —  y  =  0, 

(d)  the  line  S3  e  T  through  the  point  B  =  (1,  1). 

We  proceed  as  follows: 

(a)  The  centre  of  the  bundle  is  given  by  the  intersection 

x  —  y  T  3  =  0 
2x  +  y  +  3  =  0  ’ 

k. 

which  is  found  to  be  A  =  (—2,  1). 

(b)  We  write  the  cartesian  equation  of  the  bundle  F ’, 

£j?r  :  (a  T  2 (3s)  x  T  ( — a  T  (3s)  y  T  3 (a  T  (S')  —  0. 

As  a  consequence,  the  direction  of  an  arbitrary  line  in  the  bundle  T  is  spanned  by 
the  vector  v^/3)  =  (a  +  2/3,  (3  —  a).  In  order  for  the  lines  1  e  T  to  be  orthogonal 
to  r\  we  require 

(a  +  2/3,  (3  -  a)  •  (-1,  3)  =  0  =►  (a,  (3)  =  p(7,  -2) 

with  p  7^  0.  The  line  s  has  the  cartesian  equation  £(7,-2)  :  x  —  3y  +  5  =  0. 

(c)  In  order  for  an  element  S2  e  T  to  be  parallel  to  r 2  we  require  that  its  direction 
coincides  with  the  direction  of  7*2,  which  is£((l,  —1)).  We  impose  then 

a  +  2f3  =  —  ((3  —  a)  =>  (a,  (3)  =  p(  1,0) 

with  p  7^  0.  So  we  have  that  S2  is  given  by  the  cartesian  equation 
£(i,o)  :  x  —  y  T  3  =  0.  The  line  S2  turns  out  to  be  indeed  one  of  the  genera¬ 
tors  of  the  bundle  T . 

(d)  We  have  now  to  require  that  the  coordinates  of  B  solve  the  equation  £(a,/?),  that 
is 

(a  +  2(3)  +  ((3  —  a)  +  3 (a  H-  /?)  =  0  =>  3a  +  6(3  =  0, 

giving  (a,  (3)  =  p( 2,-1)  with  p  7^  0.  The  line  S3  is  therefore  given  by 

^(2,-i)  •  y  ~  1  =  0. 

Remark  15.4.8  Notice  that  the  computations  in  (d)  above  can  be  generalised.  If  IF  a 
is  a  bundle  of  lines  through  A,  for  any  point  B  ^  A  there  always  exists  a  unique  line 
in  5F a  which  passes  through  B.  We  denote  it  as  the  line  tab  £  T a- 

Definition  15.4.9  Let  £r  :  ax  +  by  +  c  =  0  be  the  cartesian  equation  of  the  line  r 
in  E2.  The  set  of  all  lines  which  are  parallel  to  r  is  said  to  define  a  bundle  of  parallel 
lines  or  an  improper  bundle.  The  most  convenient  way  to  describe  an  improper  bundle 
of  lines  is 

ax  T  by  h  —  0,  with  h  £  IRL 
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Exercise  15.4.10  We  consider  the  line  r  in  E2  given  by  Er  :  2x  —  y  +  3  =  0.  We 
wish  to  determine  the  lines  s  which  are  parallel  to  r  and  whose  distance  from  r  is 
d(s,  r)  =  \f5. 

The  parallel  lines  to  r  are  the  elements  r/2  of  the  improper  bundle  T  whose  cartesian 
equation  is 

E/2  :  2x  —  y  +  h  =  0. 


From  the  Proposition  15.3.15  we  have 


r  \h  —  3| 

V5  =  d(rh,r)  =  =>.  \h-  3|=5 

The  solutions  of  the  exercise  are 


h  —  3  =  ±5. 


Erg  :  2x  —  y  +  8  =  0,  Er_2  :  2x  —  y  —  2  =  0. 


In  a  way  similar  to  above,  one  has  the  notion  of  bundle  of  planes  in  a  three 
dimensional  affine  space. 

Definition  15.4.11  Let  r  be  a  line  in  E3 .  The  bundle  Tr  of  planes  through  r  is  the  set 
of  all  planes  tt  in  E3  which  contains  r,  that  is  r  C  tt.  The  line  r  is  called  the  carrier 
of  the  bundle  Tr . 

Moreover,  if  it  is  a  plane  in  E3,  the  set  of  all  planes  in  E3  which  are  parallel  to  7 r 
gives  the  ( improper )  bundle  of  parallel  planes  to  tt. 

The  following  proposition  is  the  analogue  of  the  Proposition  15.4.2. 

Proposition  15.4.12  Let  r  be  the  line  in  E3  with  cartesian  equation  given  by 


ax  +  by  +  cz  +  d  =  0 
a'x  +  b'y  +  c’  z  +  d'  =  0 


For  any  choice  of  the  parameters  (a,  /3)  (0,  0)  the  relation 

•  Oi{ax  +  by  +  cz  +  d)  +  (3(a'x  +  b'y  +  c'z  +  d')  =  0  (15.2) 

yields  the  cartesian  equation  for  a  plane  in  the  bundle  Tr  with  carrier  line  r,  and  for 
any  plane  it  in  such  a  bundle  there  is  a  pair  (a,  (3)  (0,  0)  such  that  the  cartesian 

equation  of  tt  is  given  by  (15.2). 

Definition  15.4.13  If  the  bundle  Tr  of  planes  is  given  by  the  cartesian  equation 
(15.2),  the  planes  :  ax  +  by  +  cz  +  d  —  0  and  E^/  :  a'x  +  b'y  +  c'z  +  d'  =  0 
are  called  the  generators  of  2Fr.  In  such  a  case  the  equivalent  notation  T{rr,  7r;)  will 
also  be  used. 

Remark  15.4. 14  Clearly,  the  bundle  Tr  is  generated  by  any  two  distinct  planes  7r,  tt' 
through  r. 
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Exercise  15.4.15  Given  the  line  r  whose  vector  equation  is 

r  :  (x,  y,  z)  =  (1,  2,  —1)  +  A(2,  3,  1), 

we  determine  the  bundle  Tr  of  planes  through  r .  In  order  to  obtain  a  cartesian  equation 
for  r,  we  eliminate  the  parameter  A  from  the  vector  equation  above,  as  follows 

x  —  1  T-  2A 
y  —  2  +  3A  =>> 

A  =  z  +  1 

The  cartesian  equation  for  the  bundle  is  then  given  by 

Ejr  :  a(x  —  2z  -  3)  +  /3(y  —  3z  —  5)  =  0 
with  any  (a,  f3)  ^  (0,  0). 

Let  us  next  find  the  plane  tt  e  Tr  which  passes  through  A  =  (1,  2,  3).  The  con¬ 
dition  A  e  7r  yields 

a(l  —  6  —  3)  +  (3(2  —  9  —  5)  =  0  2a  +  3(3  =  0. 

We  can  pick  (A,  fi)  =  (3,  —2),  giving  :  3(v  —  2z  —  3)  —  2(y  —  3z  —  5)  =  0, 
that  is  E^  :  3x  —  2y  +  1  =  0. 

We  also  find  the  plane  a  e  Tr  which  is  orthogonal  to  i;  =  (1,  — 1,1).  We  know 
that  a  vector  orthogonal  to  a  plane  7r  e  Tr  with  equation 

Ejr  :  ax  +  (5y  —  (2a  +  3(3)z  —  3a  —  5(3  =  0, 

is  given  by  (a,  [3,  —2a  —  3/3).  The  conditions  we  have  to  meet  are  then 

a  =  —  (3  0 

a  =4>  a  =  —  p. 
a  =  —2a  —  3p 

If  we  fix  (A,  /i)  =  (1,  —1),  we  have  ECT  :  (x  —  2z  —  3)  —  (y  —  3z  —  5)  =  0,  that  is 

Ea:  x-y-hz-h2  =  0. 


x  =  1  +  2(z  +  1)  y  I  x  -2z  -  3  =  0 

y  =  2  +  3(z  +  1)  r  :  \y  -  3z  -  5  =  0  ' 


15.5  Symmetries 

We  introduce  a  few  notions  related  to  symmetries  which  are  useful  to  solve  problems 
in  several  branches  of  geometry  and  physics. 

Definition  15.5.1  Consider  a  point  C  e  En. 
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(a)  Let  P  e  E77  be  an  arbitrary  point  in  E77 .  The  symmetric  point  to  P  with  respect 
to  C  is  the  element  P'  e  E77  that  belongs  to  the  line  rep  passing  through  C  and 
P,  and  such  that  d(P' ,  C)  =  d(P,  C )  with  P'  ^  P . 

(b)  Let  X  C  E77  be  a  set  of  points.  The  symmetric  points  to  X  with  respect  to  C  is 
the  set  X'  C  E77  given  by  every  point  P'  which  is  symmetric  to  any  P  in  X  with 
respect  to  C. 

(c)  Let  X  C  E77.  We  say  that  X  is  symmetric  with  respect  to  C  if  X  =  X' ,  that  is  if 
X  contains  the  symmetric  point  (with  respect  to  C)  to  any  of  its  points.  In  such 
a  case,  C  is  called  a  symmetry  centre  for  X. 

Exercise  15.5.2  In  the  euclidean  affine  plane  E2  consider  the  point  C  =  (2,  3). 
Given  the  point  P  =  (1,  —1),  we  determine  its  symmetric  P ’  with  respect  to  C. 
And  with  the  line  Xr  :  2x  —  y  —  3  =  0,  we  determine  its  symmetric  r'  with  respect 
to  C. 

We  consider  the  line  rCp  through  P  and  C,  which  has  the  vector  equation 

rep  :  (x,  y)  =  (1,  —1)  +  A(l,  4). 

The  distance  between  P  and  C  is  given  by  ||  P  —  C||  =  Vl7,  so  the  point  P'  can 
be  obtained  by  finding  the  value  for  the  parameter  A  such  that  the  distance 

II Pa  -  C||  =  ||(-1  +  A,  —4  +  4 A) ||  =  7(-l  +  A)2  +  (-4  +  4A)2 
be  equal  to  HP  —  C  || .  We  have  then 

7(- 1  +  A)2  +  16(— 1  +  A)2  =  vT7  =+  7171-1  + A  )2  =  yi7  =+  7(-l  +  A)2  =  1 

that  is  ||  —  1  +  A||  =  1,  giving  A  =  2,  A  =  0.  For  A  =  0  we  have  Py= o  =  P,  so 
P1  =  P\=2  =  (3,  7). 

In  order  to  determine  r'  we  observe  that  Per  and  we  claim  that,  since  r  is  a  line, 
the  set  r'  symmetric  to  r  with  respect  to  C  is  a  line  as  well.  It  is  then  sufficient  to  write 
the  line  through  P'  and  another  point  Q'  which  is  symmetric  to  Q  e  r  with  respect 
to  C.  By  choosing  Q  =  (0,  —3)  e  r,  it  is  immediate  to  compute,  with  the  same  steps 
as  above,  that  Qr  =  (4,  9).  We  conclude  that  r'  =  tcq s  with  vector  equation 

r  !  (v  =  3  +  A,  y  =  7  +  2A). 

Definition  15.5.3  Let  A,  B  be  points  in  E77 .  The  midpoint  M^b  of  the  line  segment 
AB  is  the  (unique)  point  of  the  line  rAB  with  || Mab  —  A||  =  \\MAB  -  B ||. 

Notice  that  A  is  the  symmetric  point  to  B  with  respect  to  MAB ,  and  clearly  B  is 
the  symmetric  point  to  A  with  respect  to  MAB  with  MAB  =  MBA.  One  indeed  has 
the  vector  equality  A  —  MAB  =  MAB  —  B ,  giving 
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The  set  HA b  given  by  the  points 

Hab  =  {/>(-  E”  :  II P  -  All  =  IIP  -  P||} 


can  be  shown  to  be  the  hyperplane  passing  through  MAB  and  orthogonal  to  the  line 
segment  AB.  The  set  HAb  is  called  the  bisecting  hyperplane  of  the  line  segment  AB. 
In  E2  is  the  bisecting  line  of  AB,  while  in  E3  is  the  bisecting  plane  of  AB. 

Exercise  15.5.4  Consider  the  line  segment  in  E2  whose  endpoints  are  A  =  (1,2) 
and  B  =  (3,4).  Its  midpoint  is  given  by 


M  ab 


A  +  B 

~Y 


(1,2)  +  (3,4) 
2 


(2,3). 


A  point  P  =  (x,  y)  belongs  to  the  bisecting  line  if  ||  P  —  A\\2  =  (x  —  l)2  +  (y  —  2)2 
equates  \\P  —  B\\2  =  (x  —  3)2  +  (y  —  4)2  =  ||P5||2,  which  gives 

(x  -  l)2  +  (y-  2) 2  =  (JC  -  3)2  +  (y  -  4)2  =>  -2x  +  1  -  4y  +  4  =  -6x  +  9  -  8y  +  16, 


that  is  £ Hab  •  x  +  y  —  5  =  0.  It  is  immediate  to  check  that  M  e  HAb  •  The  direction 
of  the  bisecting  line  is  spanned  by  (1 ,  —  1),  which  is  orthogonal  to  the  direction  vector 
B  —  A  =  (2,  2)  spanning  the  direction  of  the  line  rAB- 

Exercise  15.5.5  Consider  the  points  A  =  (1,2,  —1)  and  B  =  (3,  0,  1)  in  E3.  The 
corresponding  midpoint  is 


Mab 


A  +  B 
2 


(1,  2,-1) +  (3,0,  1) 
2 


(2,1,0). 


The  bisecting  plane  HAb  is  given  by  the  points  P  =  (x,  y,  z)  fulfilling  the  con¬ 
dition 


(x  -  l)2  +  Cy  -  2) 2  +  (z  +  l)2  =  IIP  -  All2  =  IIP  -  B||2  =  (x  -  3)2  +  y2  +  ( z-  l)2 


which  gives 

x  —  y  +  z  —  1=0. 

The  bisecting  plane  is  then  orthogonal  to  (1,  —1,  1),  with  tab  having  a  direction 
vector  given  by  B  —  A  =  (2,  —2,  2). 

Having  defined  the  notion  of  symmetry  of  a  set  in  E”  with  respect  to  a  point,  we 
might  wonder  about  a  meaningful  definition  of  symmetry  of  a  set  with  respect  to  an 
arbitrary  linear  affine  variety  in  En .  Such  a  task  turns  out  to  be  quite  hard  in  general, 
so  we  focus  on  the  easy  case  of  defining  only  the  notion  of  symmetry  with  respect 
to  a  hyperplane. 

Firstly,  a  general  definition. 
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Definition  15.5.6  Let  H  c  E"  be  a  hyperplane. 

(a)  Let  P  G  E77  be  an  arbitrary  point  in  E77 .  The  symmetric  point  to  P  with  respect 
to  H  is  the  element  P'  e  E77  such  that  H  is  the  bisecting  hyperplane  of  the  line 
segment  PP'. 

(b)  Let  X  C  E77  be  a  set  of  points.  The  symmetric  points  to  X  with  respect  to  H  is 
the  set  X'  C  E77  given  by  every  point  P'  which  is  symmetric  to  any  P  in  X  with 
respect  to  H. 

(c)  Let  X  C  E77.  We  say  that  X  is  symmetric  with  respect  to  H  if  X  =  X\  that  is  if 
X  contains  the  symmetric  point  (with  respect  to  H)  to  any  of  its  points.  In  such 
a  case,  H  is  called  a  symmetry  hyperplane  for  X. 

Remark  15.5.7  Notice  that  if  P'  is  the  symmetric  point  to  P  with  respect  to  the 
hyperplane  H ,  then  the  line  rPP>  is  orthogonal  to  H  and  d(P',  H)  =  d(P,  H). 

We  finish  with  some  examples  on  the  simplest  cases  in  E2  and  E3 . 

Exercise  15.5.8  A  line  is  a  hyperplane  in  E2.  Given  the  point  P  =  (1,  2)  we  deter¬ 
mine  its  symmetric  P'  with  respect  to  the  line  whose  equation  is  Er  :  2x  +  y  —  2. 

We  observe  that  if  t  is  the  line  through  P  which  is  orthogonal  to  P,  then  P’  is 
the  point  in  t  fixed  by  the  condition  d (P,r)  =  d(P',r).  The  direction  of  t  is  clearly 
spanned  by  the  vector  (2,  1),  so 


|  *  =  1  +  2A 
{  y  =  2  +  A 


and  the  points  in  t  can  be  written  as  Q\  =  (1  +  2 A,  2  +  A).  By  setting 


d(Q\,  r )  =  d(P ,  r) 


[2(1  +  2A)  +  (2  +  A)  -  2[  _  [2  +  2-21 

V4TT  “  V4TT 


|5A  +  2|  =  2 


we  see  that  Q\=o  =  P,  while  <2a=-4/5  =  P'  =  ^(—3,  6). 

Exercise  15.5.9  Given  P  =  (0,  1,  —  2)  e  E3,  we  determine  its  symmetric  P'  with 
respect  to  the  hyperplane  n  (which  is  indeed  a  plane,  since  we  are  in  E3)  whose 
equation  is  :  2x  +  4y  +  4z  —  5  =  0. 

We  firstly  find  the  line  t  through  P  which  is  orthogonal  to  i r.  The  orthogonal 
subspace  to  7r  is  spanned  by  the  vector  (2,  4,  4)  or  equivalently  (1,  2,  2),  so  the  line 
t  has  parametric  equation 

[x  =  \ 

t  \  |  y  —  1  +  2A 

[  z  =  —2  +  2A 


Since  for  the  symmetric  point  P’  it  is  d(P,  tt)  =  d(P',  7r),  we  label  a  point  Q  in 
t  by  the  parameter  A  as  Q\  =  (A,  1  +  2A,  —2  +  2A)  and  impose 
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d(gA,7r)  =  d(P,  tt) 

|2A  +  4(1  +  2A)  +  4( — 2  +  2A)  -  5| 

V36 


|4-8-5| 

V36 


18A-9|  =  9. 


We  see  that  P  =  Q\=o  and  P'  =  Qa=i  =  P'  =  ( 1 , 3 ,  0). 

Exercise  15.5.10  In  E3  let  us  determine  the  line  r'  which  is  symmetric  to  the  line 
with  equation  r  :  (x,  y,  z)  =  (0,  1,  —2)  +  /i(l,0,  0)  with  respect  to  the  plane  tt  with 
equation  i r  :  2x  +  4y  +  4z  —  5  =  0. 

The  plane  tt  is  the  same  plane  we  considered  in  the  previous  exercise.  Its  orthog¬ 
onal  space  is  spanned  by  the  vector  (1,  2,  2).  By  labelling  a  point  of  the  line  r  as 
P =  (/i,  1,  —2),  we  find  the  line  tfl  which  passes  through  P ^  and  is  orthogonal  to 
7T.  A  parametric  equation  for  tfl  is  given  by 


Ix  —  fi  T  A 
A  =  1  +  2A 
z  —  ~  2  +  2A 


We  label  then  points  Q  in  tfl  by  writing  Q\ tjJi  =  {fi -\-  A,  1  +2A,  —2  +  2A).  We 
require 

d(gA,/x,  tt)  =  d(PM,  tt) 


as  a  condition  to  determine  A,  since  fi  will  yield  a  parameter  for  the  line  r'.  We  have 


d(QXtn,  7r)  = 


d(P//,  tt)  = 


|2(/i  +  A)  +  4(1  +  2A)  +  4(— 2  +  2A)  —  5 


V36 


|2/x  +  4  —  8  —  5 
V36 


From  d(gA  tt)  =  d(P„,  tt)  we  have 


|2/i  +  18A-9|  =  |2/i-9| 


2/i+  18A-9  =  ±(2/x  —  9), 


r\ 

For  A  =  0  we  recover  gA=o,/i  =  P/i-  The  other  solution  isA  =  —  |  /i  +  1,  giving 

/  /7  4  4  \ 

QA=-(2/9)^+l,^  —  + 


By  a  rescaling  of  the  parameter  /i,  a  vector  equation  for  the  line  r  ’  can  be  written 
as 

r  :  (v,  y,  z)  =  (1,  3,  0)  +  /x(7,  -4,  -4). 

Exercise  15.5.11  Consider  the  set  X  c  E2  given  by 


X  =  {(*,  y)  e  E2  :  y  =  5x2} 
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and  the  line  r  whose  cartesian  equation  is  Er  :  v  =  0.  We  wish  to  show  that  r  is 
a  symmetry  axis  for  X,  that  is  X  is  symmetric  with  respect  to  r.  We  have  then  to 
prove  that  each  point  P',  symmetric  to  any  point  P  e  X  with  respect  to  r,  is  an 
element  in  X. 

Let  us  consider  a  generic  P  =  (x 0,  yo)  £  ^  and  determine  its  symmetric  with 
respect  to  r .  The  line  t  through  P  which  is  orthogonal  to  r  has  the  following  parametric 
equation 


t  : 


x  =  xq  +  A 


y  =  yo 


A  point  in  t  is  then  labelled  P\  =  (vo  +  A,  yo)  •  For  its  distance  from  r  we  compute 
d(Py,r)  =  1*0  +  A |,  while  d (P,r)  =  |jcol -  By  imposing  that  these  two  distances 
coincide,  we  have 


d(P\,r)  =  d(P,r)  O  |x0  +  A|  =  |v0| 

O  (*o  +  A)2  =  xl 
A(2vo  +  A)  =  0. 

The  solution  A  =  0  corresponds  to  P,  the  solution  A  =  —  2xo  yields 
P'  =  (—vo,  yo).  Such  calculations  do  not  depend  on  the  fact  that  P  is  an  element  in 
X.  If  we  consider  only  points  P  in  X,  we  have  to  require  that  yo  =  5vq.  It  follows 
that  yo  =  5 (—vo)2,  that  is  P'  e  X. 


Chapter  16 

Conic  Sections 
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This  chapter  is  devoted  to  conics.  We  shall  describe  at  length  their  algebraic  and 
geometric  properties  and  their  use  in  physics,  notably  for  the  Kepler  laws  for  the 
motion  of  celestial  bodies. 


16.1  Conic  Sections  as  Geometric  Loci 


The  conic  sections  (or  simply  conics)  are  parabolae,  ellipses  (with  circles  as  limiting 
case),  hyperbolae.  They  are  also  known  as  geometric  loci ,  that  is  collections  of  points 
P(x,  y)  gE2  satisfying  one  or  more  conditions,  or  determined  by  such  conditions. 
The  following  three  relations,  whose  origins  we  briefly  recall,  should  be  well  known 


v2  =  2py, 


x d 


+ 


y 


a * 


b 2 


=  1, 


xA 


y 


a * 


b 2 


=  1. 


(16.1) 


Definition  16.1.1  ( Parabolce )  Given  a  straight  line  5  and  a  point  F  on  the  plane  E2, 
the  set  (locus)  of  points  P  equidistant  from  8  and  F  is  called  parabola .  The  straight 
line  8  is  the  directrix  of  the  parabola,  while  the  point  F  is  the  focus  of  the  parabola. 
This  is  shown  in  Fig.  16.1. 

Fix  a  cartesian  orthogonal  reference  system  (O ;  x,  y)  for  E2,  with  a  generic  point 
P  having  coordinates  P  =  (x,  y).  Consider  the  straight  line  8  given  by  the  points 
with  equation  y  =  —p/ 2  and  the  focus  F  =  (0,  p/2)  (with  p  >  0).  The  parabola 
with  directrix  8  and  focus  F  is  the  set  of  points  fulfilling  the  condition 


d  (P,6)  =  d  (P,F). 


(16.2) 
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Fig.  16.1  The  parabola  y  =  x2/ 2 p 


Since  the  point  P'  =  (x,  —p/2)  is  the  orthogonal  projection  of  P  over  S,  with 
d (P,  (5)  =  d(P,  P'),  the  condition  (16.2)  reads 

\\P-P’\\2=  \\P-F\\2  =►  11(0,  y  +  p/T)\\2  —  ||(x,  y  —  p/2)\\2, 

that  is 

Cy  +  P/ 2)2  =  x2  +  (y  -  p/2)2  =>•  x2  =  2 py. 

If  C  is  a  parabola  with  focus  F  and  directrix  5  then, 

•  the  straight  line  through  F  which  is  orthogonal  to  5  is  the  axis  of  C, 

•  the  point  where  the  parabola  C  intersects  its  axis  is  the  vertex  of  the  parabola. 

Definition  16.1.2  {Ellipses)  Given  two  points  F\  ed  F2  on  the  plane  E2,  the  set 
(locus)  of  points  P  for  which  the  sum  of  the  distances  between  P  and  the  points  F\ 
and  F2  is  constant  is  called  ellipse.  The  points  F\  and  F2  are  called  th tfoci  of  the 
ellipse.  This  is  shown  in  Fig.  16.2. 

Fix  a  cartesian  orthogonal  reference  system  {O ;  x,  y)  for  E2,  with  a  generic  point 
P  having  coordinates  P  =  (x,  y).  Consider  the  points  F\  =  {—q,  0),  F2  =  (q,  0) 
(with  q  >  0)  and  k  a  real  parameter  such  that  k  >  2q .  The  ellipse  with  foci  F\ ,  F2 
and  parameter  k  is  the  set  of  points  P  =  (x,  y)  fulfilling  the  condition 


Fig.  16.2  The  ellipse  x2 /a2  +  y2 /b2  =  1 
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d(P,  Fi)+d(P,  F2)  =k.  (16.3) 

We  denote  by  A  =  (a,  0)  and  B  =  (0,  b)  the  intersection  of  the  ellipse  with  the 
positive  v-axis  half-line  and  the  positive  y-axis  half-line,  thus  a  >  Oand  b  >  O.From 
d(A,  F\)  +  d(A,  F2)  =  k  we  have  that  k  =  2a ;  from  d (B,  F\)  +  d (B,  F2)  =  k  we 
have  that  2y/q- 2  +  b2  =  k ,  so  we  write 

k  =  2a,  q2  =  a2  —  b2, 

with  a  >  b.  By  squaring  the  condition  (16.3)  we  have 

II (x+q,  y) ||2  +  ||(x  -  q,  y) ||2  +  2  || (x+q,  y)||  ||(x  -  q,  j)||  =  4a2, 


that  is 

2(x2  +  y2  +  q 2)  +  2^/  ( x 2  +  y2  +  q2  +  2  qx)(x2  +  y2  +  q2  —  2  qx)  =  4  a2 
that  we  write  as 

y/  ( x 2  +  y2  +  q2)2  —  4  q2x2  =  2  a2  —  ( x 2  +  y2  +  q 2). 


By  squaring  such  a  relation  we  have 


—q2x2  =  —  az(vz  +  yL  +  qA) 


2/„2 


Since  g2  =  a2  —  b2,  the  equation  of  the  ellipse  depends  on  the  real  positive  param¬ 
eters  a ,  b  as  follows 

j  2  2  i  2  2  2  t  2 

b  x  -\- a  y  =  a  b  , 


which  is  equivalent  to 


2  2 

jc  y 

- h  —  =  1 

a2  b2 


Notice  that,  if  q  =  0,  that  is  if  a  =  b,  the  foci  F\  ed  F2  coincide  with  the  origin 
O  of  the  reference  system,  and  the  ellipse  reduces  to  a  circle  whose  equation  is 


2,2  2 

v  +  y  =  r 


with  radius  r  =  a  =  b  >  0. 

If  C  is  an  ellipse  with  (distinct)  foci  F\  and  F2,  then 

•  the  straight  line  passing  through  the  foci  is  the  major  axis  of  the  ellipse, 

•  the  straight  line  orthogonally  bisecting  the  segment  F\  F2  is  the  minor  axis  of  the 
ellipse, 

•  the  midpoint  of  the  segment  F\  F2  is  the  centre  of  the  ellipse, 
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Fig.  16.3  The  hyperbola  x2 /a2  —  y2 /b2  =  1 


•  the  four  points  where  the  ellipse  intersects  its  axes  are  the  vertices  of  the  ellipse, 

•  the  distance  between  the  centre  of  the  ellipse  and  the  vertices  on  the  major  axis 
(respectively  on  the  minor  axis)  is  called  the  major  semi-axis  (respectively  minor 
semi-axis). 

Definition  16.1.3  (Hyperbolce)  Given  two  points  F\  and  F2  on  the  plane  E2,  the  set 
(locus)  of  points  P  for  which  the  absolute  difference  of  the  distances  d(P,  F\)  and 
d(P,  F2)  is  constant,  is  the  hyperbola  with  foci  F\ ,  F2.  This  is  shown  in  Fig.  16.3. 

Fix  a  cartesian  orthogonal  reference  system  (O ;  x,  y)  for  E2,  with  a  generic  point 
P  having  coordinates  P  =  (x,  y).  Consider  the  points  F\  =  (— q,  0),  F2  =  (q,  0) 
(with  q  >  0)  and  k  a  real  parameter  such  that  k  >  2q.  The  hyperbola  with  foci 
F\ ,  F2  and  parameter  k  is  the  set  of  points  P  =  (x,  y)  fulfilling  the  condition 

\d(P,Fl)~d(P,F2)\=k.  (16.4) 


Notice  that,  since  k  >  0,  such  a  hyperbola  does  not  intersect  the  y-axis,  since  the 
points  on  the  y-axis  are  equidistant  from  the  foci.  By  denoting  by  A  =  (a,  0)  (with 
a  >  0)  the  intersection  of  the  hyperbola  with  the  v-axis,  we  have 


k  =  |d(A,  Fx) 


d(A,  F2) I  =  a +  q 


5 


which  yields  a  <  q,  since  from  a  >  q  it  would  follow  that  \a  —  q  \  =  a  —  q,  giving 
k  =  2q .  The  previous  condition  then  show  that 


k  =  \2a\  =  2a. 


By  squaring  the  relation  (16.4)  we  have 


\\{x+q,  y)||2  +  ||(x  -  q,  y) ||2  -2\\{x+q,  y) ||  ||(x  -  q,  y)||  =  4a2, 
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that  is 

2(x2  +  y2  +  q 2)  —  2 yj  (x2  +  y2  +  q2  +  2qx)(x2  +  y2  +  q2  —  2  qx)  =  4a2 
which  we  write  as 

y/  ( x 2  +  y2  +  q2)2  —  4  q2x2  =  ( x 2  +  y2  +  q2)  —  2  a2. 

By  squaring  once  more,  we  have 

22  4  2/  2  i  2  i  2\ 

—q  x  =a  —  a  (x  +  y  +  g  ), 


that  reads 


/  2  2\  2  i  2  2  _  2  /  2 

(a  —  q  )x  +  a  y  =  a  (a 


From  a  <  q  we  have  q 2 
relation  as 


—  a2  >  0,  so  we  set  q2  —  a ‘ 


7  2  2  ,  22  zj  / 

—b  x  -\-  a  y  =  —a  b  , 


2  7  2 


b2  and  write  the  previous 


which  is  equivalent  to 


If  C  is  a  hyperbola  with  foci  F\  and  F2,  then 

•  the  straight  line  through  the  foci  is  the  transverse  axis  of  the  hyperbola, 

•  the  straight  line  orthogonally  bisecting  the  segment  F\  F2  is  the  axis  of  the  hyper¬ 
bola, 

•  the  midpoint  of  the  segment  F\  F2  is  the  centre  of  the  hyperbola; 

•  the  points  where  the  hyperbola  intersects  its  transverse  axis  are  the  vertices  of  the 
hyperbola, 

•  the  distance  between  the  centre  of  the  hyperbola  and  its  foci  is  the  transverse 
semi-axis  of  the  hyperbola. 

Remark  16.1.4  The  above  analysis  shows  that,  if  C  is  a  parabola  with  equation 


2 


=  2  py, 


then  its  directrix  is  the  line  y  =  —p/2  and  its  focus  is  the  point  (0,  p/2),  while  the 
equation 

y2  =  2  px 


is  a  parabola  C  with  directrix  v  =  —p/2  and  focus  (p/2,  0). 
If  C  is  an  ellipse  with  equation 
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2  2 

x  y 

- h  —  =  1 

a2  b 2 

(and  a  >  b ),  then  its  foci  are  the  points  F±  =  (±\/a2  —  b2,  0). 
If  C  is  a  hyperbola  with  equation 


a 2  b2 


then  its  foci  are  the  points  F±  =  (±^/a2  +  b2,  0). 

We  see  that  the  definition  of  a  parabola  requires  one  single  focus  and  a  straight 
line  (not  containing  the  focus),  while  the  definition  of  an  ellipse  and  of  a  hyperbola 
requires  two  distinct  foci  and  a  suitable  distance  k.  This  apparent  diversity  can  be 
reconciled.  If  F  is  a  point  is  E2  and  5  a  straight  line  with  F  £  5,  then  one  can  consider 
the  locus  given  by  points  P  in  E2  fulfilling  the  condition 

d(P,  F)  =  ed (P,5)  (16.5) 

with  ^  >  0.  It  is  clear  that,  if  e  =  1,  this  relation  defines  a  parabola  with  focus  F  and 
directrix  5.  We  shall  show  later  on  (in  Sect.  16.4  and  then  Sect.  16.7)  that  the  relation 
above  gives  an  ellipse  for  0  <  e  <  1  and  a  hyperbola  if  e  >  1 .  The  parameter  e  >  0 
is  called  the  eccentricity  of  the  conic. 

Since  symmetry  properties  of  conics  do  not  depend  on  the  reference  system, 
when  dealing  with  symmetries  or  geometric  properties  of  conics  one  can  refer  to  the 
Eqs.  (16.1). 

Remark  16.1.5  With  the  symmetry  notions  given  in  the  Sect.  15.5,  the  y-axis  is  a 
symmetry  axis  for  the  parabola  C  whose  equation  is  y  =  2 px2.  If  P  =  (vo,  yo)  £  C, 
the  symmetric  point  P'  to  P  with  respect  to  the  y-axis  is  P'  =  (—xo,  yo),  which 
belongs  to  C  since  2py$  =  (—Xq)  =  xf}.  Furthermore,  the  axis  of  a  parabola  is  a 
symmetry  axis  and  its  vertex  is  equidistant  from  the  focus  and  the  directrix  if  the 
parabola. 

In  a  similar  way  one  shows  that  the  axes  of  an  ellipse  or  of  a  hyperbola,  are 
symmetry  axes  and  the  centre  is  a  symmetry  centre  in  both  cases.  For  an  ellipse  with 
equation  ax2  fly2  =  1  or  a  hyperbola  with  equation  ax2  —  fly2  =  1  the  centre 
coincided  with  the  origin  of  the  reference  system. 


16.2  The  Equation  of  a  Conic  in  Matrix  Form 

In  the  previous  section  we  have  shown  how,  in  a  given  reference  system,  a  parabola, 
an  ellipse  and  a  hyperbola  are  described  by  one  of  equations  in  (16.1).  But  evidently 
such  equations  are  not  the  most  general  ones  for  the  loci  we  are  considering,  since 
they  have  particular  positions  with  respect  to  the  axes  of  the  reference  system. 
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A  common  feature  of  the  Eqs.(16.1)  is  that  they  are  formulated  as  quadratic 
polynomials  in  v  and  y.  In  the  present  section  we  study  general  quadratic  polynomial 
equations  in  two  variables. 

Since  to  a  large  extent  one  does  not  make  use  of  the  euclidean  structure  given 
by  the  scalar  product  in  E ”,  one  can  consider  the  affine  plane  A2(M).  By  taking 
complex  coordinates,  with  the  canonical  inclusion  M2  C2,  one  enlarges  the  real 
affine  plane  to  the  complex  one, 


A2(M)  ^  A2(C). 

Definition  16.2.1  A  conic  section  (or  simply  a  conic)  is  the  set  of  points  (locus) 
whose  coordinates  (x,  y )  satisfy  a  quadratic  polynomial  equation  in  the  variables 
v,  y,  that  is 


9  9 

a\\x  -\- 2  an  xy -\- a22  y  +  2  <213  v  +  2*223  y  +  <233  =  0 


(16.6) 


with  coefficients  ciij  e  R. 

Remark  16.2.2  We  notice  that 

(a)  The  equations  of  conics  considered  in  the  previous  section  are  particular  case  of 
the  general  Eq.  (16.6).  As  an  example,  for  a  parabola  we  have 

a  11  =  1,  <223  =  ~2p,  a\2  =  <222  =  <213  =  <233  =  0. 

Notice  also  that  in  all  the  equations  considered  in  the  previous  section  for  a 
parabola  or  an  ellipse  or  a  hyperbola  we  have  <212  =  0. 

(b)  There  are  polynomial  equations  like  (16.6)  which  do  not  describe  any  of 
the  conics  presented  before:  neither  a  parabola,  nor  an  ellipse  or  a  hyper¬ 
bola.  Consider  for  example  the  equation  x2  —  y2  =  0,  which  is  factorised  as 
(v  +  y)(v  —  y)  =  0.  The  set  of  solutions  for  such  an  equation  is  the  union  of 
the  two  lines  with  cartesian  equations  v  +  y  —  0  and  x  —  y  =  0. 

Any  quadratic  polynomial  equation  (16.6)  that  can  be  factorised  as 

(ax  +  by  +  c)(a'x  +  b'y  +  c')  =  0 

describes  the  union  of  two  lines.  Such  lines  are  not  necessarily  real.  Consider  for 
example  the  equation  x2  y2  =  0.  Its  set  of  solutions  is  given  only  by  the  point 
(0,  0)  in  A2(M),  while  in  A2(C)  we  can  write  x2  +  y2  =  (x  +  i y)(x  —  i y),  so  the 
conic  is  the  union  of  the  two  conjugate  lines  with  cartesian  equation  v  +  iy  =  0 
and  v  —  iy  =  0. 

Definition  16.2.3  A  conic  is  called  degenerate  if  it  is  the  union  of  two  lines.  Such 
lines  can  be  either  real  (coincident  or  distinct)  or  complex  (in  such  a  case  they  are 
also  conjugate). 
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The  polynomial  equation  (16.6)  can  be  written  in  a  more  succinct  form  by  means 
of  two  symmetric  matrices  associated  with  a  conic.  We  set 

(011  012  013 
012  022  023 
013  023  033 

By  introducing  these  matrices,  we  write  the  left  end  side  of  the  Eq.  (16.6)  as 

x\ 

y  =  a\\x2  +2anxy  -\-a22y2  +  2^13  x  +  2 <323  y  +  <333. 

V 

(16.7) 


(x  y  1) 


an  a\2  <213 
an  a22  a23 
a  13  <323  <333 


R 


2,2 


9  A  = 


011  012 
012  022 


The  quadratic  homogeneous  part  of  the  polynomial  defining  (16.6)  and  (16.7),  is 
written  as 


Fc(x,  y )  =  an  x2  ~h2al2xy  +  a22  y2  =  (x  y)  A 

Such  an  Fc  is  a  quadratic  form,  called  the  quadratic  form  associated  to  the  conic  C. 
Definition  16.2.4  Let  C  be  the  conic  given  by  the  equation 


9  9 

a\\x  -\- 2  a\2  xy -\- a22  y  +  2  <213  v  +  2  <223  y  +  033  =  0. 


The  matrices 


(011  012  013\ 
012  022  023  I  , 
013  023  033/ 


( 011  012 
\012  022 


are  called  respectively  the  matrix  of  the  coefficients  and  the  matrix  of  the  quadratic 
form  of  C. 

Exercise  16.2.5  The  matrices  associated  to  the  parabola  with  equation  y  =  3x2  are, 


(3  0  0  \ 

B  =  0  0  -1/2  , 

\0  -1/2  0  / 


Remark  16.2.6  Notice  that  the  six  coefficients  atj  in  (16.6)  determine  a  conic,  but  a 
conic  is  not  described  by  a  single  array  of  six  coefficients  since  the  equation 

ka\\  x2  +  2 vy  +  ka22  y1  +  2  £<213  v  +  2  £<223  y  +  £033  =  0 


defines  the  same  locus  for  any  k  e  R  \{0}. 


16.3  Reduction  to  Canonical  Form  of  a  Conic:  Translations 
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A  natural  question  arises.  Given  a  non  degenerate  conic  with  equation  written  as  in 
(16.6)  with  respect  to  a  reference  frame,  does  there  exist  a  new  reference  system  with 
respect  to  which  the  equation  for  the  conic  has  a  form  close  to  one  of  those  given  in 
(16.1)? 

Definition  16.3.1  We  call  canonical  form  of  a  non  degenerate  conic  C  one  of  the 
following  equations  for  C  in  a  given  reference  system  (O;  x,  y). 

(i)  A  parabola  has  equation 


x2  =  2  py  or  y 2  =  2  px 


(ii)  A  real  ellipse  has  equation 


v2  y1 
a1  b 2 


=  1 


while  an  imaginary  ellipse  has  equation 


x2  y2 
a2  b2 


=  -1. 


(iii)  A  hyperbola  has  equation 


x‘ 


y 


a * 


b2 


=  1 


or 


y 


a * 


b2 


=  -1 


(16.8) 


(16.9) 


(16.10) 


(16.11) 


A  complete  answer  to  the  question  above  is  given  in  two  steps. 

One  first  considers  only  conics  whose  equation,  in  a  given  reference  system, 
(O;  x,  y)  has  coefficient  an  =  0,  that  is  conics  whose  equation  lacks  the  mixed 
term  xy.  The  reference  system  (O';  X ,  Y)  for  a  canonical  form  is  obtained  with  a 
translation  from  (O;  x,  y). 

The  general  case  of  a  conic  whose  equation  in  a  given  reference  system  (O;  x,  y) 
may  have  the  mixed  term  vy  will  require  the  composition  of  a  rotation  and  a  transla¬ 
tion  from  (O;  x,  y)  to  obtain  the  reference  system  (O';  X ,  Y)  for  a  canonical  form. 

Exercise  16.3.2  Let  T  :  y  =  2x2  describe  a  parabola  in  the  canonical  form,  and  let 
us  define  the  following  translation  on  the  plane 


T, 


(xQ,yo) 


x  =  X  +  vq 


y  =  y  +  To 


The  equation  for  the  conic  V  with  respect  to  the  reference  system  (O';  X,  Y)  is  then 

Y  =  2X2  +  4x0X  +  2xl  -  y0. 
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Exercise  16.3.3  Let  r;  :  v2  +  2y2  =  1  be  an  ellipse  in  the  canonical  form.  Under 
the  translation  of  the  previous  example,  the  equation  for  U  with  respect  to  the  refer¬ 
ence  system  (O';  X,  Y)  is 

X2  +  2F2  +  2xqX  +  4yoY  +  Xq  +  2  —  1  =  0. 

Notice  that,  after  the  translation  by  T(XQtyo),  the  equations  for  the  conics  V  and  V' 
are  no  longer  in  canonical  form,  but  both  still  lack  the  mixed  term  xy.  We  prove  now, 
with  a  constructive  method,  that  the  converse  holds  as  well. 

Exercise  16.3.4  (Completing  the  squares)  Let  C  be  a  non  degenerate  conic  whose 
equation  reads,  with  respect  to  the  reference  system  (O;  x,  y). 


an  X2  +  <222  y2  +  2<2i3  X  +  2<223  y  +  ^33  =  0-  (16.12) 

Since  the  polynomial  must  be  quadratic,  there  are  two  possibilities.  Either  both 
an  and  <222  different  from  zero,  or  one  of  them  is  zero.  We  then  consider: 

(I)  It  is  <2 ii  =  0,  <222  7^  0  (the  case  flu  ^0  and  <222  =  0  is  analogue). 

The  Eq.  (16.12)  is  then 


^22  y2  +  2<223  y  +  <U3  +  2<2i3  x  =  0. 


(16.13) 


From  the  algebraic  identities: 


<222  y  +  2  <223  y  =  ^  22 


=  a22 


2  ,  ry  a23 

y  +2  —  y 


y  + 


<222 


<223 

<222 


<223 

<222 


=  <222  y  + 


<223 

<222 


<2 


23 


<222 


we  write  the  Eq.  (16.13)  as 


<222  y  + 


<223 

<222 


<2 


23 


<222 


T  <233  T  2  <213  x  —  0. 


(16.14) 


Since  C  is  not  degenerate,  we  have  <213  7^  0  so  we  write  (16.14)  as 


<222 1  y  +  — )  +  2ai3  (x  +  a33_a72  °23  I  =  0 


<222 


2<222<2l3 


16.3  Reduction  to  Canonical  Form  of  a  Conic:  Translations 


303 


which  reads 


y  + 


<3 23 
<3 22 


2$l3  /  ^33^22—^23 

i 


<322 


2^22^13 


Under  the  translation 


X  =  X  +  ($33*222  -  $f3)/2$22$13 


t  =  y  +  $23/^22 


we  get 

Y2  =  2pX  (16.15) 

with  p  =  —  $i3/$22-  This  is  the  canonical  form  (16.8). 

If  we  drop  the  hypothesis  that  the  conics  C  is  non  degenerate,  we  have 
$13  =  0  in  the  Eq.  (16.13).  Notice  that,  for  the  case  $n  =0we  are  considering, 
det  B  =  —  $i3$22-  Thus  the  condition  of  non  degeneracy  can  be  expressed  as  a 
condition  on  the  determinant  of  the  matrix  of  the  coefficients,  since 

$13  =  0  det  B  =  —  $i3$22  =  0. 

The  Eq.  (16.14)  is  then 

$23  \ 2  _  ^23  —  ^33^22 
$22  /  $22 

and  with  the  translation 

X  =  jc 

Y  =  y  +  <W$22 

it  reads 

Y2=q  (16.16) 

With  q  =  ($23  —  $33$22)/$22- 
(II)  It  is  $11  7^  0,  $22  7^  0. 


With  algebraic  manipulation  as  above,  we  can  write 


$11  x2  +  2  $13  V  =  $11  (  V  + 


<322  y2  +  2  $23  y  =  $22  (  T  + 


$13 

$11 

<323 

<322 


$ 


13 


$11 

2 


$ 


23 


<322 
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So  the  Eq.  (16.12)  is  written  as 

flu  (x  +  — )  +  a22  (y+—)  +  a33  -  ^  ^  =  0.  (16.17) 

V  All  /  V  a22/  All  ^22 

If  we  consider  the  translation  given  by 

r 

X  =  x  +  an/an 
Y  =  y  +  $23  /  $22 

k. 

the  conic  C  has  the  equation 

2  2 

0nX2  +  a22F2  =  h,  with  h  =  — a33  +  —  +  — ,  (16.18) 

an  CI22 

and  h  ^  0  since  C  is  non  degenerate.  The  coefficients  an  and  a22  can  be  either 
concordant  or  not.  Up  to  a  global  factor  (—  1),  we  can  take  an  >0.  So  we  have 
the  following  cases. 

(Ha)  It  is  a ii  >  0  and  a22  >  0.  One  distinguish  according  to  the  sign  of  the  coeffi¬ 
cient  h : 

•  If  h  >  0,  the  Eq.  (16.18)  is  equivalent  to 


an  X2 
h 


+ 


a22  y2 

h 


Since  an/ h  >  0  and  a22//z  >  0,  we  have  (positive)  real  numbers  a,  b  by 
defining  h/an  =  a2  and  /z/a22  =  b2.  The  Eq.  (16.18)  is  written  as 


X2  Y 2 
a2  +  b 2 


(16.19) 


which  is  the  canonical  form  of  a  real  ellipse  (16.9). 

•  If  h  <  0,  we  have  —  an/h  >  Oand  — a22//t  >  0,  we  can  again  introduce  (pos¬ 
itive)  real  numbers  a,  b  by  —h/an  =  a2  and  — /i/a22  =  b2 .  The  Eq.  (16.18) 
can  be  written  as 

x2  y2 

-2  +  TT  =  -1.  (16.20) 


which  is  the  canonical  form  of  an  imaginary  ellipse  (16.10). 

•  If  h  =  0  (which  means  that  C  is  degenerate),  we  set  1  /an  =  a2  and  1  /a22  =  b2 
with  real  number  a,  b,  so  to  get  from  (16.18)  the  expression 


X2  y2 

a2  +  b2 


=  0. 


(16.21) 
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(lib)  It  is  a  11  >  0  and  <222  <  0.  Again  depending  on  the  sign  of  the  coefficient  h  we 
have: 

•  If  h  >  0,  the  Eq.  (16.18)  is 


an  X2 
h 


+ 


^22  y2 

h 


Since  an/ h  >  0  and  a^i/h  <  0,  we  can  define  h/an  =  a2  and  —h/a^i  =  b2 
with  a ,  b  positive  real  numbers.  The  Eq.  (16.18)  becomes 


X2  X2 
a 2  b2 


(16.22) 


which  the  first  canonical  form  in  (16.11). 

•  If  h  <  0,  we  have  —an/  h  >  0  and —<222/^  <  0,  so  we  can  define— h  /a  n  =  a2 
and  h/aiz  —  1  /b2  with  <2,  b  positive  real  numbers.  The  Eq.  (16.18)  becomes 


X2  Y2 
a2  b2 


(16.23) 


which  is  the  second  canonical  form  in  (16.11). 

•  If  h  =  0  (that  is  C  is  degenerate),  we  set  1  /an  =  a2  and  — 1/(222  =  b2  with 
<2,  b  real  number,  so  to  get  from  (16.18)  the  expression 


X 


2 


Once  again,  with  B  the  matrix  of  the  coefficients  for  C,  the  identity 


(16.24) 


det  B  =  <2h<222  h 


shows  that  the  condition  of  non  degeneracy  of  the  conic  C  is  equivalently  given  by 
det  B  7^  0. 

The  analysis  done  for  the  cases  of  degenerate  conics  makes  it  natural  to  introduce 
the  following  definition,  which  has  to  be  compared  with  the  Definition  16.3.1. 

We  call  canonical  form  of  a  degenerate  conic  C  one  of  the  following  equations 
for  C  in  a  given  reference  system  (O;  x,  y). 

(i)  A  degenerate  parabola  has  equation 

x2  =  q  or  y1  —  q-  (16.25) 

(ii)  A  degenerate  ellipse  has  equation 
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(iii)  A  degenerate  hyperbola  has  equation 


2 


(16.27) 


Remark  16.3.5  With  the  definition  above,  we  have  that 

(i)  The  conic  C  with  equation  x2  =  q  is  the  union  of  the  lines  with  cartesian 
equations  v  =  ±^/q.  If  q  >  0  the  lines  are  real  and  distinct,  if  q  <  0  the  lines 
are  complex  and  conjugate.  If  q  =  0  the  conic  C  is  the  y-axis  counted  twice. 
Analogue  cases  are  obtained  for  the  equation  y2  =  q. 

(ii)  The  equation  b2  x2  +  a2  y2  =  0  has  the  unique  solution  (0,  0)  if  we  consider 
real  coordinates.  On  the  complex  affine  plane  A2(C)  the  solutions  to  such  equa¬ 
tions  give  a  degenerate  conic  C  which  is  the  union  of  two  complex  conjugate 
lines,  since  we  can  factorise 

b2  x2  +  a2  y2  =  (b  x  +  ia  y)(b  x  —  ia  y). 

(iii)  The  solutions  to  the  equation  b2  x2  —  a2  y2  =  0  give  the  union  of  two  real  and 
distinct  lines,  since  we  can  factorise  as  follows 


7  2  2  z 

b  x  —  a  y 


2  2 


(bx  +  a  y)(bx  —  ay). 


What  we  have  studied  up  to  now  is  the  proof  of  the  following  theorem. 

Theorem  16.3.6  Let  C  be  a  conic  whose  equation,  with  respect  to  a  reference  sys¬ 
tem  ( O ;  v,  y)  lacks  the  monomial  xy.  There  exists  a  reference  system  (O';  X ,  Y), 
obtained  from  (O;  x,  y)  by  a  translation,  with  respect  to  which  the  equation  for  the 
conic  C  has  a  canonical  form. 

Exercise  16.3.7  We  consider  the  conic  C  with  equation 

x2  +  4y2  +  2x  —  12  y  +  3  =  0. 

We  wish  to  determine  a  reference  system  (O';  X,  Y)  with  respect  to  which  the 
equation  for  C  is  canonical.  We  complete  the  squares  as  follows: 

x 2  +  2  x  =  ( x  +  1 ) 2  —  1 , 

Ay2-\2y  =  A(y  -  |)2  -  9 


and  write 


x2  +  Ay2  +  2x  -  I2y  +  3  =  (x  +  l)2  +  4  (y  -  §)2  -  7. 


16.3  Reduction  to  Canonical  Form  of  a  Conic:  Translations 
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With  the  translation 


the  equation  for  C  reads 


X  =  x  +  \ 


3 

2 


Y  2  y  2 

X2  +  4Y2  =  7  =>•  —  H - =  1. 

7  7/4 

This  is  an  ellipse  with  centre  (X  =  0,  Y  =  0)  =  (x  =  —  1,  y  =  3/2),  with  axes 
given  by  the  lines  X  =  0  and  Y  =  0  which  are  v  =  —1  and  y  =  3/2,  and  semi-axes 
given  by  \fl ,  \Jl /2. 


16.4  Eccentricity:  Part  1 


We  have  a  look  now  at  the  relation  (16.5)  for  a  particular  class  of  examples.  Consider 
the  point  F  =  (ax,  ay)  in  E2  and  the  line  5  whose  points  satisfy  the  equation  x  =  u, 
with  u  7^  ax.  The  relation  d(P,  F)  =  ed(P,  5)  (with  e  >  0)  is  satisfied  by  the  points 
P  =  (x,  y)  whose  coordinates  are  the  solutions  of  the  equation 

(y  —  ay )2  +  (1  —  e2)x2  +  2  (ue2  —  ax)x  +  a2  —  u2e2  =  0.  (16.28) 

We  have  different  cases,  depending  on  the  parameter  e. 

(a)  We  have  already  mentioned  that  for  e  =  1  we  are  describing  the  parabola  with 
focus  F  and  directrix  S.  Its  equation  from  (16.28)  is  given  by 

(y  —  ay)2  +  2{u  —  ax)x  +  a2  —  u2  =  0.  (16.29) 

(b)  Assume  e  ^  1.  Using  the  results  of  the  Exercise  16.3.4,  we  complete  the  square 
and  write 


or 


(y  -  ay)2  +  (1 
(y  -  ay)2  +  (1 


e2)x2  +  2  (ue2  —  ax)x  +  a2  —  u2e 2  =  0 

e2{u  -  ax)2 


e  )  [  x  + 


2  '  2 

ue  —  ax 


l  —  e: 


1  —  e: 


=  0.  (16.30) 


Then  the  translation  given  by 


Y  =  y  —  ay 

X  =  x  +  ( ue 2  —  ax)/(  1  —  e2) 

allows  us  to  write,  with  respect  to  the  reference  system  ( 0'\  X,  Y ),  the  equation 


as 
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Yz  +  (1  -  e2)X2  = 


e2(u  -  ax): 
1  —  e2 


Depending  on  the  value  of  e,  we  have  the  following  possibilities. 

(bl)  If  0  <  e  <  1,  all  the  coefficients  of  the  equation  are  positive,  so  we  have  the 
ellipse 

,2  \  2  i  .2 


1  —  C 


e(u  -  ax) 


?  1  —  e  9 

X2  +  — - —  Y 2  =  1. 


e2(u  -  ax )2 


An  easy  computation  shows  that  its  foci  are  given  by 


F±  =  (± 


e2(u  -  ax) 
1  —  e1 


,ay) 


with  respect  to  the  reference  system  ( O' ;  X,  Y)  and  then  clearly  by 


F+  =  (ax,ay), 


F-  =  ( 


ax  +  e2ax  —  2  ue2 


1  —  e: 


,  ay) 


with  respect  to  (0\  x,  y).  Notice  that  F+  =  F ,  the  starting  point. 
(b2)  If  e  >  1  the  equation 


1  —  c 


e(u  -  ax) 


X2  - 


e2-\ 


e2(u  -  ax)2 


Y2  =  1 


represents  a  hyperbola  with  foci  again  given  by  the  points  F±  written  before. 
Remark  16.4.1  Notice  that,  if  e  =  0,  the  relation  (16.28)  becomes 

(y  ~  ay )2  +  (x-  ax )2  =  0, 

that  is  a  degenerate  imaginary  conic,  with 

(y  ~  ay  +  i(x  -  ax))(y  -  ay  -  i(x  -  ax))  =  0. 

If  we  fix  e2(u  —  ax )2  =  r2  ^  0  and  consider  the  limit  e  — >  0,  the  Eq.  (16.28)  can 
be  written  as 


(x  -  ax )2  +  (y  -  ay )2  =  r2 . 


This  is  another  way  of  viewing  a  circle  as  a  limiting  case  of  a  sequence  of  ellipses. 
The  case  for  which  the  point  F  e  S  also  gives  a  degenerate  conic.  In  this  case 
u  =  ax  and  the  Eq.  (16.28)  is 


(y  ~  ax)2  +  (1  —  e2)(x  —  2 u)2  —  0 


which  is  the  union  of  two  lines  either  real  (if  1  <  e)  or  imaginary  (if  1  >  e). 


16.5  Conic  Sections  and  Kepler  Motions 
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16.5  Conic  Sections  and  Kepler  Motions 

Via  the  notion  of  eccentricity  it  is  easier  to  describe  a  fundamental  relation  between 
the  conic  sections  and  the  so  called  Keplerian  motions. 

If  xi  (t)  and  X2 (t)  describe  the  motion  in  E3  of  two  point  masses  m\  and  m2,  and 
the  only  force  acting  on  them  is  the  mutual  gravitational  attraction,  the  equations  of 
motions  are  given  by 


miXi 

m2x  2 


—  Gm\m2 
—Gm\m2 


Xj  x2 

l|Xl  -  X2| 
X2  -  Xi 

l|Xl  -  x2| 


Here  G  is  a  constant,  the  gravitational  constant.  We  know  from  physics  that  the 
centre  of  mass  of  this  system  moves  with  no  acceleration,  while  for  the  relative 
motion  r (t)  =  xi (t)  —  x2(t)  the  Newton  equations  are 

/ir(t)  =  —Gm\m2  — r  (16.31) 

r 5 


with  the  norm  r  =  ||x||  and  fi  =  m\ni2/ (m\  -\- m2)  the  so  called  reduced  mass  of  the 
system.  A  qualitative  analysis  of  this  motion  can  be  given  as  follows. 

With  a  cartesian  orthogonal  reference  system  (O;  x,  y,  z)  in  E3,  we  can  write 
r (t)  =  (x(t),  y(t),  z(t))  and  r (t)  =  (, x(t ),  y(t),  z(t))  for  the  vector  representing  the 
corresponding  velocity.  From  the  Newton  equations  (16.31)  the  angular  momentum 
(recall  its  definition  and  main  properties  from  Sects.  1.3  and  1 1.2)  with  respect  to  the 
origin  O , 

dL o 

-  =  fi  {r  A  r  +  r  A  r)  =  0, 

d  t 

is  a  constant  of  the  motion,  since  r  is  parallel  to  r  from  (16.31).  This  means  that  both 
vectors  r (t)  and  r (t)  remain  orthogonal  to  the  direction  of  L 0,  which  is  constant:  if 
the  initial  velocity  r(£  =  0)  is  not  parallel  to  the  initial  position  r  (t  =  0),  the  motion 
stays  at  any  time  t  on  the  plane  orthogonal  toh0(t  =  0). 

We  can  consider  the  plane  of  the  motion  as  E2,  and  fix  a  cartesian  orthogonal  ref¬ 
erence  system  (O',  x,  y),  so  that  the  angular  momentum  conservation  can  be  written 
as 

(xy  -  yx)  =  l 


with  the  constant  /  fixed  by  the  initial  conditions.  We  also  know  that  the  gravitational 
force  is  conservative,  thus  the  total  energy 

1  .  2  1 

-/x||r||  —  Gm\in2  -  =  E. 

2  r 
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is  also  a  constant  of  the  motion.  It  is  well  known  that  the  Eq.  (16.31)  can  be  completely 
solved.  We  omit  the  proof  of  this  claim,  and  mention  that  the  possible  trajectories  of 
such  motions  are  conic  sections,  with  focus  F  =  (0,  0)  e  E2  and  directrix  S  given 
by  the  equation  x  =  l/e  with 


Gm  i  m2/i 

and  eccentricity  parameter  given  by 


2/jlEI 2 
(Gmim2/i)2 

One  indeed  shows  that 

IfiEl2  ^ 

0 Gm\m2H )2 

for  any  choice  of  initial  values  for  position  and  velocity. 

This  result  is  one  of  the  reasons  why  conic  sections  deserve  a  special  attention 
in  affine  geometry.  From  the  analysis  of  the  previous  section,  we  conclude  that  for 
E  <  0,  since  0  <  e  <  1,  the  trajectory  of  the  motion  is  elliptic.  If  the  point  mass  m2 
represents  the  Sun,  while  m  1  a  planet  in  our  solar  system,  this  result  gives  the  well 
observed  fact  that  planet  orbits  are  plane  elliptic  and  the  Sun  is  one  of  the  foci  of  the 
orbit  (Kepler  law). 

The  Sun  is  also  the  focus  of  hyperbolic  orbits  ( E  >  0)  or  parabolic  ones  ( E  =  0), 
orbits  that  are  travelled  by  comets  and  asteroids. 


16.6  Reduction  to  Canonical  Form  of  a  Conic:  Rotations 

Let  us  consider  two  reference  systems  (O;  x,  y)  and  ( 0\  X,  Y)  having  the  same 
origin  and  related  by  a  rotation  by  an  angle  of  a , 

' 

x  —  cos  a  X  +  sin  a  Y 

< 

y  =  —  sin  a  X  +  cos  a  Y 

With  respect  to  ((9;  x,  y),  consider  the  parabola  F:  y  =  x2.  In  the  rotated  system 
(O;  X,  Y)  the  equation  for  T  is  easily  found  to  be 

9 

—  sin  a  X  +  cos  aY  =  (cos  a  X  +  sin  a  Y) 

=> 


cos  a2  X2  +  sin  2a  XY  +  sin  a2  Y2  +  sin  a  X  —  cos  a  Y  =  0. 


16.6  Reduction  to  Canonical  Form  of  a  Conic:  Rotations 
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We  see  that  as  a  consequence  of  the  rotation,  there  is  a  mixed  term  XY  in  the 
quadratic  polynomial  equation  for  the  parabola  V .  It  is  natural  to  wonder  whether 
such  a  behaviour  can  be  reversed. 

Example  16.6.1  With  respect  to  (O;  x,y),  consider  the  conic  C:  xy  =  k  for  a  real 
parameter  k.  Clearly,  for  k  =  0  this  is  degenerate  (the  union  of  the  coordinate  axes  v 
and  y).  On  the  other  hand,  the  rotation  to  the  system  (O,  X,Y)  by  an  angle  a  =  |, 

x  =  ±(X  +  Y) 

■  y=y2(x~Y)  ’ 

transforms  the  equation  of  the  conic  to 

X2  -Y2  =  2k. 


This  is  a  hyperbola  with  foci  F±  =  (±2^/k,  0  when  k  >  0  or  F±  =  (0,  ±2^f\k\) 
when  k  <  0. 

In  general,  if  the  equation  of  a  conic  has  a  mixed  term,  does  there  exist  a  reference 
system  with  respect  to  which  the  equation  for  the  given  conic  does  not  have  the  mixed 
term? 

It  is  clear  that  the  answer  to  such  a  question  is  in  the  affirmative  if  and  only  if 
there  exists  a  reference  system  with  respect  to  which  the  quadratic  form  of  the  conic 
is  diagonal.  On  the  other  hand,  since  the  quadratic  form  associated  to  a  conic  is 
symmetric,  we  know  from  the  Chap.  10  that  it  is  always  possible  to  diagonalise  it 
with  a  suitable  orthogonal  matrix. 

Let  us  first  study  how  the  equation  in  (16.7)  for  a  conic  changes  under  a  general 
change  of  the  reference  system  of  the  affine  euclidean  plane  we  are  considering. 

Definition  16.6.2  With  a  rotation  of  the  plane  we  mean  a  change  in  the  reference 
system  from  (O;  x,  y)  to  (O;  x' ,  y')  that  is  given  by 


(16.32) 


with  P  e  SO (2)  a  special  orthogonal  matrix,  referred  to  as  the  rotation  matrix.  If 
we  write 

P  =  (P n  ^ 

\P2l  P22 )  ’ 


the  transformation  above  reads 


x  =  pnx’  +  pny’ 

y  =  pi\x'  +  pny' 


(16.33) 


These  relations  give  the  equations  of  the  rotation. 
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A  translation  from  the  reference  system  ( 0;x',y' )  to  another 
described  by  the  relations 

x'  —  X  H-  xo 
'  /  =  Y  +  y0 


(O';  X ,  Y )  is 
(16.34) 


where  (— xo,  — yo)  are  the  coordinates  of  the  point  O  with  respect  to  (O';  X,  7)  and, 
equivalently,  (xo,  yo)  are  the  coordinates  of  the  point  O'  with  respect  to  (O;  x' ,  y'). 

A  proper  rigid  transformation  on  the  affine  euclidean  plane  E2  is  a  change  of  the 
reference  system  given  by  a  rotation  followed  by  a  translation.  We  shall  refer  to  a 
proper  rigid  transformation  also  under  the  name  of  roto -translation. 

Let  us  consider  the  composition  of  the  rotation  given  by  (16.33)  followed  by  the 
translation  given  by  (16.34),  so  to  map  the  reference  system  (O;  x,  y)  into  (O';  X,  7). 
The  equation  describing  such  a  transformation  are  easily  found  to  be 


v  =  pnX  +  pl2Y  +  a 
y  =  P21X  +  P22Y  +  b 


(16.35) 


where 


a  =  Pi  1*0  +  Pnyo 
b  =  P21  xo  +  P22  yo 


are  the  coordinates  of  O'  with  respect  to  (O;  x,  y).  The  transformation  (16.35)  can 
be  written  as 


y 

1 


(pn 

P21 

\0 


(16.36) 


and  we  call 


Q 


(pu 

P21 


P 12  a 
P22  b 


0  1 


(16.37) 


the  matrix  of  (associated  to)  the  proper  rigid  transformation  (roto-translation). 

Remark  16.6.3  A  rotation  matrix  P  is  special  orthogonal,  that  is  fP  =  P~l  and 
det(P)  =  1.  A  roto-translation  matrix  Q  as  in  (16.37),  although  satisfies  the  identity 
det(<2)  =  1,  is  not  orthogonal. 

Clearly,  with  a  transposition,  the  action  (16.32)  of  a  rotation  matrix  also  gives 
(x  y)  =  (V  yr)  TP ,  while  the  action  (16.36)  of  a  roto-translation  can  be  written  as 

b  1)  =  (XY  1 YQ. 


16.6  Reduction  to  Canonical  Form  of  a  Conic:  Rotations 
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Let  us  then  describe  how  the  matrices  associated  to  the  equation  of  a  conic  are 
transformed  under  a  roto-translation  of  the  reference  system.  Then,  let  us  consider  a 
conic  C  described,  with  respect  to  the  reference  system  (O;  x,  y),  by 


(x  y  1)  B 


Fc(x,y )  =  (x  y)  A 


Under  the  roto-translation  transformation  (16.36)  the  equation  of  the  conic  C  with 
respect  to  the  reference  system  (O';  X,  Y)  is  easily  found  to  becomes 


(X  Y  1)‘QBQ 


Also,  under  the  same  transformations,  the  quadratic  form  for  C  reads 

Fc(x',y')  =  (*'  yyp  A  P 

with  respect  to  the  reference  system  (O;  x',y')  obtained  from  (O;  x,y)  under  the 
action  of  only  the  rotation  P.  Such  a  claim  is  made  clearer  by  the  following  propo¬ 
sition. 

Proposition  16.6.4  The  quadratic  form  associated  to  a  conic  C  does  not  change  for 
a  translation  of  the  reference  system  with  respect  to  which  it  is  defined. 

Proof  Let  us  consider,  with  respect  to  the  reference  system  (O;  x',  y'),  the  conic 
with  quadratic  form 


Fc(x',y')  =  (x’  y')  A'  F,j  =  au  (x')2  +  2anx'y'  +  a22  (y'f- 

Under  the  translation  (16.34)  we  have  x'  =  X  —  r0  e  /  =  Y  —  y0>  that  is 

a\\ X  +  2an  XY  +  <222  Y  +  {monomials  of  order  <  1}. 

The  quadratic  form  associated  to  C,  with  respect  to  the  reference  system 
( O' ;  X ,  y),  is  then 


FC(X,  Y)  =  anX2  +2anXY  +a12Y2 


(x  n  *  (?) 


with  the  same  matrix  A'. 


□ 
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Given  the  quadratic  form  Fc  associated  to  the  conic  C  in  (O;  x' ,  y')9  we  have 
then  the  following: 


Fc(x',  y')  =  (V  y)  7>  A  P  r'j  =►  Fc(X,  Y )  =  (. XY)’P  A  P 

All  of  the  above  proves  the  following  theorem. 

Theorem  16.6.5  Let  C  be  a  conic  with  associated  matrix  of  the  coefficients  B  and 
matrix  of  the  quadratic  form  A  with  respect  to  the  reference  system  ( O ;  x,  y).  If 
Q  is  the  matrix  of  the  roto -translation  mapping  the  reference  system  (0;x,y)  to 
(O';  X,  Y),  with  P  the  corresponding  rotation  matrix,  the  matrix  of  the  coefficients 
associated  to  the  conic  C  with  respect  to  (O';  X ,  Y)  is 


B'  =  tQB  Q, 


while  the  matrix  of  the  canonical  form  is 

A'  =  ’P  A  P  =  P~l  A  P. 


In  light  of  the  Definition  13.1.4,  the  matrices  A  and  A'  are  quadratically  equiva- 
lent.  □ 

Exercise  16.6.6  Consider  the  conic  C  whose  equation,  in  the  reference  system 

(O;  x ,  y )  is 

jc2  -  2vy  +  y2  +  4jc  +  4y  -  1  =  0. 


Its  associated  matrices  are 


We  first  diagonalise  the  matrix  A.  Its  characteristic  polynomial  is 


pA(T)  =  \A-TI 


1  -T  -1 
-1  1  -  T 


(1  -  T)2  -  1  =  T(T  -  2). 


The  eigenvalues  are  A  =  0  and  A  =  2  with  associated  eigenspaces, 


V0  =  ker(/A)  =  {(*,  y)  €  R2  :  *  -  y  =  0}  =  £((1,  1)), 

V2  =  ker(/A_2/)  =  ((xj)el2  :  x+j=0)  =  £((1,-1)). 
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It  follows  that  the  special  orthogonal  matrix  P  giving  the  change  of  the  basis  is 

P  =  j_  ( 1  1 

V2  l-l  1 


and  eigenvectors  ordered  so  that  det(P)  =  1.  This  rotated  the  reference  system  to 
(0;x',yf)  with 

'  *  =  75  (*'  +  /) 

y  =  j=2  (-*'  +  /)  ‘ 


Without  translation,  the  roto-translation  matrix  is 


1  1  0 


Q!  =  \  -1  1  o 

‘  0  0  72, 

and  from  the  Theorem  16.6.5,  the  matrix  associated  to  C  with  respect  to  the  reference 
system  (O;  x' ,  y')  is  B  =  rQr  B  Q .  We  have  then 


1  -1  0 

*=7I  I1  1  0 
0  0  1 


1 

2 


'2  0  0 
0  0  2 a/2 

0  2^2  -1 


so  that  the  equation  for  C  reads 

2(x')2  +  4V2 y'  -1=0. 

By  completing  the  square  at  the  right  hand  side,  we  write  this  equation  as 

(x')2  =  -2V2  (/  -  f ) . 


With  the  translation 


X  =  x' 

y  =  y  -4 


we  see  that  C  is  a  parabola  with  the  canonical  form 


X2  =  -2V2  Y 


and  the  associated  matrices 


10  0 
B’  =  |  0  0  72 
0  72  0 


A'  = 


1  0 
0  0 
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Rather  than  splitting  the  reduction  to  canonical  form  into  a  first  step  given  by  a 
rotation  and  a  second  step  given  by  a  translation,  we  can  reduce  the  equation  for  C 
with  respect  to  (O;  x,  y)  to  its  canonical  form  by  a  proper  rigid  transformation  with 
a  matrix  Q  encoding  both  a  rotation  and  a  translation.  Such  a  composition  is  given  by 


X  =  (V  +  y') 

y  =  jn  (-•*'  +  /) 


y  =  -L(-x  +  y  +  f) 


which  we  write  as 


with 


l  (  1  1  V2/8\ 

*  u  i  rj 


We  end  this  example  by  checking  that  the  matrix  associated  to  the  conic  C  with 
respect  to  the  reference  system  (O' ;  X ,  Y)  can  be  computed  as  it  is  described  in  the 
Theorem  16.6.5,  that  is 


/2  0  0  \ 

•QB  Q  =  I  0  0  2V2  =  2 B'. 

\o  2V2  0  / 

We  list  the  main  steps  of  the  method  we  described  in  order  to  reduce  a  conic  to 
its  canonical  form  as  the  proof  of  the  following  results. 

Theorem  16.6.7  Given  a  conic  C  whose  equation  is  written  in  the  reference  system 
(O;  x,  y),  there  always  exists  a  reference  system  (O';  X ,  Y),  obtained  with  a  roto- 
translation  from  (O;  x,  y),  with  respect  to  which  the  equation  for  C  is  canonic. 

Proof  Let  C  be  a  conic,  with  associated  matrices  A  (of  the  quadratic  form)  and  B 
(of  the  coefficients),  with  respect  to  the  reference  system  (O;  x,  y).  Then, 

(a)  Diagonalise  A,  computing  an  orthonormal  basis  with  eigenvectors 
v\  =  (pn,  P21),  V2  =  (P12,  P22),  given  by  the  rotation 

v  =  pn  x'  +  pny' 

y  =  P21  +  P22  / 


Q 


/ 


and  define 


P 11  P12  0 
P21  P22  0 

0  0  1 


(16.38) 
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With  respect  to  the  reference  system  the  conic  C  has  matrix 

B'  =  tQ!  B  Qr ,  and  the  corresponding  quadratic  equation,  which  we  write  as 


(*'  y'  1)  B' 


(16.39) 


lacks  the  monomial  term  x'y' . 

(b)  Complete  the  square  so  to  transform,  by  the  corresponding  translation,  the  ref¬ 
erence  system  (0\  xf,  yf)  to  (O';  X,  Y ),  that  is 


X  =  x'  +  a 
Y  =  y'  +  b 


(16.40) 


From  this,  we  can  express  the  Eq.  (16.39)  for  C  with  respect  to  the  reference 
system  ( O' ;  X,  Y).  The  resulting  equation  is  canonical  for  C. 

(c)  The  equations  for  the  roto-translation  from  (O;  x,  y)  to  (O']  X,  Y)  are  given  by 
substituting  the  translation  transformation  (16.40)  into  (16.38). 

Corollary  16.6.8  Given  a  degree-two  polynomial  equation  in  the  variable  x  and  y, 
the  set  (locus)  of  zeros  of  such  equation  is  one  of  the  following  loci:  ellipse,  hyperbola, 
parabola,  union  of  lines  ( either  coincident  or  distinct). 

The  proof  of  the  Proposition  16.3.4  together  with  the  result  of  the  Theorem  16.6.5, 
which  give  the  transformation  relations  for  the  matrices  associated  to  a  given  conic 
C  under  a  proper  rigid  transformation,  allows  one  to  prove  the  next  proposition. 

Proposition  16.6.9  A  conic  C  whose  associated  matrices  are  A  and  B  with  respect  to 
a  given  orthonormal  reference  system  (O;  x,  y)  is  degenerate  if  and  only  //"det  B  =  0. 
Depending  on  the  values  of  the  determinant  of  A  the  following  cases  are  possible 

det  A  <  0  <£>  C  hyperbola 

det  A  =  0  <£>  C  parabola 

det  A  >  0  C  ellipse . 

The  relative  signs  of  det  (A)  and  det  B  determine  whether  the  conic  is  real  or 
imaginary. 

Exercise  16.6.10  As  an  example,  we  recall  the  results  obtained  in  the  Sect.  16.4.  For 
the  conic  d(P,  F)  =  e  d (P,  (5)  with  focus  F  =  (ax,  ay)  and  directrix  5  :  x  =  u,  the 
matrix  of  the  coefficients  associated  to  the  Eq.  (16.28)  is 

1  —  e2  0  ue2  —  ax 
0  1  ay 

ue2  —  ax  ay  a2  +  a2  —  u2e2 


B  = 
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with  then  det  B  =  —  e2(ax  —  u)2.  We  recover  that  the  sign  of  (1  —  e2)  determines 
whether  the  conic  C  is  an  ellipse,  or  a  parabola,  or  a  hyperbola.  We  notice  also  that 
the  conic  is  degenerate  if  and  only  if  at  least  one  of  the  conditions  e  =  0  or  ax  =  u 
is  met. 


16.7  Eccentricity:  Part  2 

We  complete  now  the  analysis  of  the  conics  defined  by  the  relation 

d (P,  F)  =  ed (P,6) 

in  terms  of  the  eccentricity  parameter.  In  Sect.  16.4  we  have  studied  this  equation 
with  an  arbitrary  F  and  5  parallel  to  the  y-axis,  when  it  becomes  the  Eq.  (16.28).  In 
general,  for  a  given  eccentricity  the  previous  relation  depends  only  on  the  distance 
between  F  and  5.  Using  a  suitable  roto-translation  as  in  the  previous  section,  we 
have  the  following  result. 

Proposition  16.7.1  Given  a  point  F  and  a  line  S  in  E2  such  that  F  £  5,  there  exists 
a  cartesian  orthogonal  coordinate  system  (O';  X ,  7)  with  F  =  O'  and  with  respect 
to  which  the  equation  d(P,  F)  =  ed (P,  5)  (with  e  >  0)  is  written  as 

Y2  +  X2  -  e2(X  -  u)2  =  0. 

Proof  Given  a  point  F  and  a  line  S  ^  F,  it  is  always  possible  to  roto-translate  the 
starting  coordinate  system  (O;  x,  y)  to  a  new  one  (O';  X,  Y)  in  such  a  way  that 
O'  =  F  and  the  line  5  is  given  by  the  equation  X  =  u  /  0.  The  result  then  follows 
from  (16.28)  being  ax  =  ay  =  0.  □ 

We  know  from  the  Sect.  16.4  that  if  e  =  1,  the  equation  represents  a  parabola 
with  directrix  X  =  u  7^  0  and  focus  F  =  (0,  0).  If  1  7^  e,  the  equation  represents 
either  an  ellipse  (0  <  e  <  1)  or  a  hyperbola  (e  >  1)  with  foci  F+  =  (0,  0)  and 
F-  =  (—  yryi,  0).  Also,  e  =  0  yields  the  degenerate  conic  X2  Y2  =  0,  while 
u  =  0  (that  is  F  e  5)  gives  the  degenerate  conic  Y2  (1  —  e2)X2  =  0. 

We  can  conclude  that  the  Eq.  (16.5)  represents  a  conic  whose  type  depends  on  the 
values  of  the  eccentricity  parameter.  Its  usefulness  resides  in  yielding  a  constructive 
method  to  write  the  equation  in  canonical  form,  even  for  the  degenerate  cases. 

We  address  the  inverse  question:  given  a  non  degenerate  conic  C  with  equation 


9  9 

a\\x  +  2an  xy  +  <222  y  +  2  <213  v  +  2^23  y  +  ^33  =  0 


is  it  possible  to  determine  its  eccentricity  and  its  directrix? 
We  give  a  constructive  proof  of  the  following  theorem. 
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Theorem  16.7.2  Given  the  non  degenerate  conic  C  whose  equation  is 
an  x2  +  2anxy  +  a2 2  y2  +  2  ai3x  +  2a22y  +  a33  =  0, 
there  exists  a  point  F  and  a  line  S  with  F  £  S  such  that  a  point  P  e  C  if  and  only  if 

d(P,  F)  =  ed (P,S) 

for  a  suitable  value  e  >  0  of  the  eccentricity  parameter. 

Proof  As  in  the  example  Exercise  16.6.6  we  firstly  diagonalise  the  matrix  of  the 
quadratic  form  of  C  finding  a  cartesian  orthogonal  system  (O;  x\  y')  with  respect 
to  which  the  equation  for  C  is  written  as 

atnix'j1  +  a22(y')2  +  2anx'  +  2a23  /  +  a33  =  0, 

with  an,  a22  the  eigenvalues  of  the  quadratic  form.  This  is  the  equation  of  the  conic 
in  the  form  studied  in  the  Proposition  16.3.4,  whose  proof  we  now  use.  We  have  the 
following  cases 

(a)  One  of  the  eigenvalues  of  the  quadratic  form  is  zero,  say  an  =  0  (the  case 
a22  =  0  is  analogous). 

Up  to  a  global  (—1)  factor  that  we  can  rescale,  the  equation  for  C  is 


£*22  (yr)2  +  2a\2xr  +  2a22  y'  +  0:33  —  0, 


with  a22  >  0  and  <213  7^  0  (non  degeneracy  of  C).  Since  there  is  no  term  (x')2, 
this  equation  is  of  the  form  (16.28)  only  if  e  =  1.  Thus  it  is  of  the  form  (16.29) 
written  as 

(y  ciy)2  +  2 (u  -  ax)(x  —  \  (u  +  ax))  =  0. 

The  two  expression  are  the  same  if  and  only  if  we  have  e  =  1 ,  and 


a 


y 


a22 

<x22 


and 


Iu  —  ax  =  a\2/a22 
u  T  ax  =  (0^3  —  cz22a22) / a\2a22 


These  say  that  C  is  the  parabola  with  focus  and  directrix  given,  with  respect  to 

(0;  *',/),  by 


23  —  ^33^22  —  £*13 


£*13  +  £*23  —  £*33**22 


2£*13£*22 


£*22 


2£*13£*22 
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With  the  translation 


X  =  x'  +  (ct33Ct22  —  Ct23)/2ct22Ctl3 


Y  =  y'  +  ol  23/^22 


it  can  indeed  be  written  as 


Y2  +  2  —  X  =0. 

OL  22 


If  0^22  =  0  and  an  7^  0  the  result  would  be  similar  with  the  v'-axis  and  y’  axis 
interchanged. 

(b)  Assume  an  7^  0  and  a22  7^  0.  We  write  the  equation  for  C  as  in  (16.17), 


n,+n  +(y+n  _  _l  i.n, + ^ ^ 

£*22  \  £*11/  \  a 22/  £*22  \  £*11  £*22 

(16.41) 

and  compare  it  with  (16.30) 


ue 2  —  ax 
1  —  e2 


+  O' 


(16.42) 


Notice  that  with  this  choice  (that  the  directrix  be  parallel  to  the  y- axis,  x  =  u) 
we  are  not  treating  the  axes  v  and  y  in  an  equivalent  way.  We  would  have  a 
similar  analysis  when  exchanging  the  role  of  the  axes  v  and  y.  The  conditions 
to  satisfy  are 


II  -  e2  =  OL\l/OL22 

and 

ay  —  —  «23/«22 


e2(u—ax)2  _h_ 

\—e2  OL22 

ue2—ax  Q13 

1—  e2  ot\\ 


with  h  =  —  CK33  + 


a 


13 


an 


+ 


a 


23 


a  22 


(16.43) 


We  see  that  h  =  0  would  give  a  degenerate  conic  with  either  e  =  0  or  u  =  ax , 
that  is  the  focus  is  on  the  directrix.  As  before,  up  to  a  global  (—1)  factor  we 
may  assume  a22  >  0.  And  as  in  Sect.  16.3  we  have  two  possibilities  according 
to  the  sign  of  an. 

(bl)  The  eigenvalues  have  the  same  sign:  a22  >  0  and  an  >  0.  From  the  first  condi¬ 
tion  in  (16.43)  we  need  a22  >  an  and  we  get  that  e  <  1 .  Then  the  last  condition 
requires  that  the  parameter  h  >  0  be  positive.  This  means  that  C  is  a  real  ellipse. 
The  case  a22  <  £*11  also  results  into  a  real  ellipse  but  requires  that  the  role  of 
the  axes  v  and  y  be  exchanged.  (The  condition  an  =  £*22  would  give  a  circle 
and  result  in  e  =  0  which  we  are  excluding.) 

(b2)  The  eigenvalues  an  and  a22  are  discordant.  Now  the  conditions  (16.43)  requires 
e  >  1  and  the  parameter  h  to  be  negative.  This  means  that  C  is  a  hyperbola 
of  the  second  type  in  (16.11).  To  get  the  other  type  in  (16.11),  once  again  one 
needs  to  exchange  the  axes  v  and  y. 
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As  mentioned,  the  previous  analysis  is  valid  when  the  directrix  is  parallel  to  the 
y-axis.  For  the  case  when  the  directrix  is  parallel  to  the  v-axis  (the  equation  y  =  u), 
one  has  a  similar  analysis  with  the  relations  analogous  to  (16.43)  now  written  as 


1  -  e2  =  a22/a 


n 


and 


ax  =  —  oitt/a 


li 


e2(u—ayy 


\-e2 


ue 


-a. 


\—el 


.  Jl_ 

O  n 
O 23 

022 


with  h  =  — 0:33  +  —  + 


a 


23 


on 


022 


(16.44) 

In  particular  for  0  <  a22  <  cx22  these  are  the  data  of  a  real  ellipse,  while  for 
an  >  0  and  a22  <  0  (and  h  <  0)  this  are  the  data  for  a  hyperbola  of  the  first  type  in 
(16.11).  □ 


In  all  cases  above,  the  parameters  e,  u,  ax,  ay  are  given  in  terms  of  the  conic 
coefficients  by  the  relations  (16.43)  or  (16.44).  Being  these  quite  cumbersome,  we 
omit  to  write  the  complete  solutions  for  these  relations  and  rather  illustrate  with 
examples  the  general  methods  we  developed. 

Exercise  16.7.3  Consider  the  hyperbolas 


y 


2 


k  =  ±l. 


If  k  =  1,  the  relations  (16.43)  easily  give  the  foci 


F±  =  (±72, 0) 


and  corresponding  directrix  S±  with  equation 

x  =  ±f. 

On  the  other  hand,  for  k  =  1,  the  relations  (16.44)  now  give  the  foci 

F±  =  (0,  ±72) 

and  corresponding  directrix  S±, 

V  =  ±f. 

Exercise  16.7.4  Consider  the  C  of  the  example  Exercise  16.3.7,  whose  equation  we 
write  as 

X2  +  4y2  +  2x  -  I2y  +  3  =  (x  +  l)2  ±  4 (y  -  |)2  —  7  =  0. 

It  is  easy  now  to  compute  that  this  ellipse  has  eccentricity  e  =  2^  and  foci 

F±  =  (—1  ±  |). 


322 


16  Conic  Sections 


The  directrix  5±  corresponding  to  the  focus  F±  is  given  by  the  line 

x  =  -l±±p. 

Exercise  16.7.5  Consider  the  conic  C  with  equation 

x2  —  ky 2  —  2x  —  2  =  0 

with  a  parameter  k  e  R.  By  completing  the  square,  we  write  this  equation  as 

(x  -  l)2  -ky2  -3  =  0. 

Depending  on  the  value  of  k,  we  have  different  cases. 

(i)  If  k  <  —  1,  it  is  evident  that  C  is  a  real  ellipse  with  an  <  0^22,  and  the  condition 
(16.43)  gives  eccentricity  e  =  J 1  +  with  foci 

F±  =  (1  ±  7^",  0)  (16.45) 

and  corresponding  directrix  S±  with  equation 

*  =  1  ±  ■  (16-46> 

a!)  if-i  <  k  <  0  the  conic  C  is  again  a  real  ellipse,  whose  major  axis  is  par¬ 
allel  to  the  y-axis,  so  an  >  a22-  Now  the  relations  (16.44)  yield  eccentricity 
e  =  VT+T,  with  foci 


a  =  0,  ±y-3(i+  l)) 


and  corresponding  directrix  5±  given  by  the  lines  with  equation 


y  = 


-k(k+l) 


(iii)  If  k  =  0  the  conic  C  is  degenerate. 

(iv)  If  k  >  0,  the  conic  C  is  a  hyperbola.  It  is  easy  to  compute  the  eccentricity  to  be 

e  =  f  (the  same  expression  as  for  k  <  —  1),  with  the  foci  and  the  directrix 

given  by  (16.45)  and  (16.46). 

The  matrix  of  the  coefficients  of  this  conic  C  is  given  by 


B  = 
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with  det  A  =  —  k  and  det  B  =  —3k.  By  the  Proposition  16.6.9  we  recover  the  listed 
results:  C  is  degenerate  if  and  only  if  k  =  0;  it  is  a  hyperbola  if  and  only  if  k  >  0; 
an  ellipse  if  and  only  if  k  <  0. 


16.8  Why  Conic  Sections 

We  close  the  chapter  by  explaining  where  the  loci  on  the  affine  euclidean  plane  E2 
that  we  have  described,  the  conic  sections,  get  their  name  from.  This  will  also  be 
related  to  finding  solutions  to  a  non-linear  problem  in  E3 . 

Fix  a  line  7  and  a  point  V  e  7  in  E3 .  A  (double)  cone  with  axis  7  and  vertex  V 
is  the  bundle  of  lines  through  V  whose  direction  vectors  form,  with  respect  to  7,  an 
angle  of  fixed  width. 

Consider  now  a  plane  7 r  C  E3  which  does  not  contain  the  vertex  of  the  cone.  We 
show  that,  depending  on  the  relative  orientation  of  7 r  with  the  axis  of  the  cone,  the 
intersection  tt  Fi  C  —  a  conic  section  —  is  a  non  degenerate  ellipse,  or  a  parabola,  or 
a  hyperbola. 

Let  (0,8)  =  (O;  x,  y,  z)  be  an  orthonormal  reference  frame  for  E3,  with  8  an 
orthonormal  basis  for  E3 .  To  be  definite,  we  take  the  z-axis  as  the  axis  of  a  cone  C , 
its  vertex  to  be  V  =  O  and  its  width  an  angle  0  <  6  <  7t/2.  It  is  immediate  to  see 
that  the  cone  C  is  given  by  the  points  P  =  (x,  y,  z)  of  the  lines  whose  normalised 
direction  vectors  are 

E  3  u(a)  =  (sin  0  cos  a,  sin  0  sin  a,  cos  6) 

with  a  E  [0,  2n).  The  parametric  equation  for  these  lines  (see  the  Definition  14.2.7) 
is  then 

Iv  =  A  sin  9  cos  a 
y  =  A  sin  6  sin  a  . 
z  =  A  cos  6 

with  A  a  real  parameter.  This  expression  provides  a  vector  equation  for  the  cone  C. 
By  eliminating  the  parameter,  one  gets  a  cartesian  equation  for  C  as  given  by  the 
relation 

£r(a)  :  v2  +  y1  —  (tan2  6)z2  =  0. 

Without  loss  of  generality,  we  may  intersect  the  cone  C  with  a  plane  tt  which 
is  orthogonal  to  the  yz  coordinate  plane  and  meeting  the  z  axis  at  the  point 
A  =  (0,  0,  k  >  0).  If  /3  e  (0,  7t/2)  is  the  angle  between  the  axis  of  the  cone  (the 
z  axis)  and  (its  projection  on)  the  plane  7r,  the  direction  Sn  of  the  plane  is  orthogo¬ 
nal  to  the  normalised  vector  v  =  (0,  cos  [3,  sin  (3).  We  know  from  Chap.  15  that  the 
cartesian  equation  for  the  plane  it  is  then 


:  (cos  (3)y  +  (sin  (3)(z  —  k)  =  0. 
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The  intersection  C  Cl  n  is  then  given  by  the  solution  of  the  system 

x2  +  y2  —  (tan2  9)z2  =  0 
(cos  (3)y  +  (sin  (3){z  —  k)  =  0 


(16.47) 


This  is  the  only  problem  in  this  textbook  which  is  formulated  in  terms  of  a  system 
of  non-linear  equations.  By  inserting  the  second  equation  in  the  first  one,  elemen¬ 
tary  algebra  gives,  for  the  projection  on  the  plane  vy  of  the  intersection  C  PI  7r,  the 
equation, 

x2  +  (1  —  tan2  0  cot2  (3)  y2  +  2k  tan2  6  cot  (3  y  —  k2  tan2  0.  (16.48) 


From  what  we  have  described  above  in  this  chapter,  this  equation  represents  a 
conic. 

Its  matrix  of  the  coefficients  is 

(10  0  \ 

5=0  1  —  tan2  0  cot2  (3  k  tan2  6  cot  f3  I  , 

\0  k  tan2  6  cot  (3  —k2  tan2  6  J 

while  the  matrix  of  the  quadratic  form  is 

A=( 1  0  ) 

y  0  1  —  tan2  6  cot2  (3  J 

One  then  computes 

det(A)  =  1  —  tan2  6  cot2  f3,  det  B  =  —k2  tan2  6  . 

Having  excluded  the  cases  k  =  0  and  tan  6  =  0,  we  know  from  the  Proposition  16.6.9 
that  the  intersection  C  Fi  n  represents  a  non  degenerate  real  conic.  Some  algebra 
indeed  shows  that: 


det(A)  >  0 

tan2  (3  >  tan2  6 

P  >  o, 

det(A)  =  0 

tan2  (3  =  tan2  0 

<=> 

II 

(16.49) 

det(A)  <  0 

tan 2  (3  <  tan2  6 

p  <6, 

thus  giving  an  ellipse,  a  parabola,  a  hyperbola  respectively.  These  are  shown  in 
Figs.  16.4  and  16.5. 
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Fig.  16.4  The  ellipse  with  the  limit  case  of  the  circle 


Fig.  16.5  The  parabola  and  the  hyperbola 

Remark  16.8.1  As  a  particular  case,  if  we  take  [3  =  | ,  from  (16.48)  we  see  that 
C  D  7T  is  a  circle  with  radius  R  =  k  tan  0.  On  the  other  hand,  with  k  =  0,  that  is  7 r 
contains  the  vertex  of  the  cone,  one  has  det  B  =  0.  In  such  a  case,  the  (projected) 
Eq.  (16.48)  reduces  to 


v2  +  (1  —  tan2  6  cot2  (3)  y2  =  0. 
Such  equation  represents: 

2a.  the  union  of  two  complex  conjugate  lines  for  (3  >  6, 
2b.  the  points  (x  =  0,  y),  that  is  the  y-axis  for  [3  =  6 , 

2c.  the  union  of  two  real  lines  for  (3  <  0. 
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We  conclude  giving  a  more  transparent,  in  a  sense,  description  of  the  intersection 
C  fl  7T  by  using  a  new  reference  system  (O ;  x',  /,  z'),  by  a  rotation  around  the  x-axis 
where  the  plane  tt  is  orthogonal  to  the  axis  z'-axis.  adapted  to  tt.  From  Chap.  11,  the 
transformation  we  consider  is  given  in  terms  of  the  matrix  in  SO(3), 

/l  0  o  \  /x\ 

10  sin  /?  —  cos  3  I  I  y  I  . 
y  0  cos  3  sin  3  J  \zj 

With  respect  to  the  new  reference  system,  the  system  of  Eq.  (16.47)  becomes 

|  (xf)2  +  ((sin/?))/  +  (cos/?)^)2  —  (tan2  0)((sin/?)z'  —  (cos /?)}/) 2  =  0 
{  z!  —  k  sin  3  =  0 


It  is  then  easy  to  see  that  the  solutions  of  this  system  of  equations  are  the  points 
having  coordinates  z!  =  k  sin  3  and  {x' ,  y')  satisfying  the  equation 


(x)2  +  (sin2  (3  —  tan2  9  cos2  /3)(y')2  +  2k  cos  (3  sin2  (3{\  +  tan2  9)y'  +  (cos2  (3  —  tan2  9  sin2  p)  k 2  sin2  (3  —  0 . 

(16.50) 

Clearly,  this  equation  represents  a  conic  on  the  plane  z!  =  k  sin  3  with  respect  to  the 
orthonormal  reference  system  (O;  x',  y')-  Its  matrix  of  the  coefficients  is 


(1  0 

I  0  sin2  3(  1  —  tan2  6  cot2  3) 
yO  k  cos  3  sin2  /?(1  +  tan2  6) 


0 

k  cos  3  sin2  3(3  +  tan2 

k 2  sin2  3  cos2  3(  1  —  tan2  0  tan2  3) 


while  the  matrix  of  the  quadratic  form  is 


3  2 

y0  sin2/?(l 


°  ^1 

tan2  6  cot2  3)) ) 


One  then  computes 

det(A)  =  sin2/?(l  —  tan 2  6  cot2/?),  detZ?  =  —  k2  sin 2  3  tan2  0  . 


With  k  7^  0  and  tan  0  7^  0,  clearly  also  in  this  case  the  relations  (16.49)  are  valid. 
And  as  particular  cases,  if  we  take  3  =  tt/2,  one  has  that  C  Cl  it  is  a  circle  with  radius 
R  =  k  tan  0.  On  the  other  hand,  for  k  =  0,  (that  is  tt  contains  the  vertex  of  the  cone) 
so  that  det  B  =  0,  the  Eq.  (16.50)  reduces  to 


(xr)2  +  (sin2/?  —  tan 2  6  cos 2/?)(/)2  =  0. 


Such  equation  as  before  represents:  the  union  of  two  complex  conjugate  lines  for 
3  >  6;  the  points  x'  =  0  for  /?  =  0\  the  union  of  two  real  lines  for  /?  <  0. 


16.8  Why  Conic  Sections 


327 


Remark  16.8.2  We  remark  that  both  Eqs.  (16.48)  and  (16.50)  describe  the  same  type 
of  conic,  depending  on  the  relative  width  of  the  angles  (3  and  6.  What  differs  is  their 
eccentricity.  The  content  of  the  Sect.  16.7  allows  us  to  compute  that  the  eccentricity 
of  the  conic  in  (16.48)  is  e2  =  tan2  0  cot2  (3,  while  for  the  conic  in  (16.50)  we  have 
e2  =  (1  +  tan2  6)  cos2  f3. 


Appendix  A 

Algebraic  Structures 


This  appendix  is  an  elementary  introduction  to  basic  notions  of  set  theory,  together 
with  those  of  group,  ring  and  field.  The  reader  is  only  supposed  to  know  about 
numbers,  more  precisely  natural  (containing  the  zero  0),  integer,  rational  and  real 
numbers,  that  will  be  denoted  respectively  by  N,  Z,  Q,  R.  Some  of  their  properties 
will  also  be  recalled  in  the  following.  We  shall  also  introduce  complex  numbers 
denoted  C  and  (classes  of)  integers  Z p  =  Z//?Z. 


A.l  A  Few  Notions  of  Set  Theory 

Definition  A.1.1  Given  any  two  sets  A  and  B,  by  A  x  B  we  denote  their  Cartesian 
product.  This  it  is  defined  as  the  set  of  ordered  pairs  of  elements  from  A  and  B , 
that  is, 

A  x  B  =  {($,  Z?)  |  a  G  A,  b  £  /?}. 

Notice  that  A  x  B  ^  B  x  A  since  we  are  considering  ordered  pairs.  The  previous 
definition  is  valid  for  sets  A ,  B  of  arbitrary  cardinality.  The  set  A  x  A  is  denoted  A2. 

Exercise  A.1.2  Consider  the  set  A  =  {O,  ♦}.  The  Cartesian  product  A2  is 

then 

A2  =  A  x  A 

=  {CO,  O),  (❖,  <?),  (❖,  A),  (❖,  A),  (<?,  ❖),  (<?,  <?),  (<?,  A),  (<?,  A), 

(A,  O),  (A,  C),  (A,  A) (A,  A),  (A,  O),  (A,  V),  (A,  A),  (A,  A)}. 

Definition  A.  1.3  Given  any  set  A,  a  binary  relation  on  A  is  any  subset  of  the  Carte¬ 
sian  product  A2  =  A  x  A.  If  such  a  subset  is  denoted  by  TZ,  we  say  that  the  pair  of 
elements  a,  b  in  A  are  related  or  in  relation  if  (a,  b)  e  1Z  and  write  it  as  a  Tib. 
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Fig.  A.l  A  binary  relation  in  N2 


Example  A.  1.4  Consider  A  =  N,  the  set  of  natural  numbers,  with  1Z  the  subset  of 
N2  =  N  x  N  given  by  the  points  as  in  the  Fig.  A.  1 .  We  see  that  272-1,  but  it  is  not  true 
that  172-1.  One  may  easily  check  that  1Z  can  be  written  by  the  formula 

nlZm  O  m  =  n  —  1 ,  for  any  ( n ,  m)  e  N  . 


Definition  A.  1.5  A  binary  relation  on  a  set  A  is  called  an  equivalence  relation  if  the 
following  properties  are  satisfied 

•  72-  is  reflexive ,  that  is  alZa  for  any  a  e  A, 

•  72-  is  symmetric ,  that  is  alZb  =>>  blZa ,  for  any  a,  b  e  A, 

•  72-  is  transitive ,  that  is  a IZb  and  blZc  =>  a72-c  for  any  a,  b,  c  e  A. 

Exercise  A.  1.6  In  any  given  set  A,  the  equality  is  an  equivalence  relation.  On  the  set 
T  of  all  triangles,  congruence  of  triangles  and  similarity  of  triangles  are  equivalence 
relations.  The  relation  described  in  the  Example  A.  1.4  is  not  an  equivalence  relation, 
since  reflexivity  does  not  hold. 

Definition  A.  1.7  Consider  a  set  A  and  let  72-  be  an  equivalence  relation  defined  on 
it.  For  any  a  e  A,  one  defines  the  subset 

[a]  =  {x  e  A  |  xlZa}  c  A 

as  the  equivalence  class  of  a  in  A.  Any  element  x  e  [a]  is  called  a  representative 
of  the  class  [a].  It  is  clear  that  an  equivalence  class  has  as  many  representatives  as 
the  elements  it  contains. 


Proposition  A.1.8  With  1Z  an  equivalence  relation  on  the  set  A,  the  following  prop¬ 
erties  hold: 


(1)  If  alZb,  then  [a]  =  [b]. 
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(2)  If(a,b )  ^  7 Z,  then  [a]  fi  [ b ]  =  0. 

(3)  A  =  \Ja€A  [a];  this  is  a  disjoint  union. 

Proof  (1)  One  shows  that  the  mutual  inclusions  [a]  c  [b]  and  [b]  c  [a]  are  both 
valid  if  a  TZb.  Let  v  g  [a] ;  this  means  xTZ  a.  From  the  hypothesis  a  TZ  b,  so  by 
the  transitivity  of  TZ  one  has  xl Zb,  that  is  x  g  [b].  This  proves  the  inclusion 
M  c  \b\.  The  proof  of  the  inclusion  [b]  c  [a]  is  analogue. 

(2)  Let  us  suppose  that  A  3  x  e  [a]  U  [b].  It  would  mean  that  xTZa  and  xlZb. 
From  the  symmetry  of  TZ  we  would  then  have  aTZx ,  and  from  the  transitivity 
this  would  result  in  a  TZb,  which  contradicts  the  hypothesis. 

(3)  It  is  obvious,  from  (2).  □ 

Definition  A.  1.9  The  decomposition  A  =  UagaM  *s  called  the  partition  of  A 
associated  (or  corresponding)  to  the  equivalence  relation  TZ. 

Definition  A.  1.10  If  TZ  is  an  equivalence  relation  defined  on  the  set  A,  the  set  whose 
elements  are  the  corresponding  equivalence  classes  is  denoted  A/ TZ  and  called  the 
quotient  of  A  modulo  TZ.  The  map 

tc  :  A  — >  A  f  TZ  given  by  a  i->  [a] 

is  called  the  canonical  projection  of  A  onto  the  quotient  A/TZ. 


A.2  Groups 

A  set  has  an  algebraic  structure  if  it  is  equipped  with  one  or  more  operations.  When 
the  operations  are  more  than  one,  they  are  required  to  be  compatible.  In  this  section 
we  describe  the  most  elementary  algebraic  structures. 

Definition  A.2.1  Given  a  set  G,  a  binary  operation  *  on  it  is  a  map 

*  :  G  x  G  — >  G. 

The  image  of  the  operation  between  a  and  b  is  denoted  by  a  *  b.  One  also  says  that 
G  is  closed ,  or  stable  with  respect  to  the  operation  *.  One  usually  writes  (G,  *)  for 
the  algebraic  structure  *  defined  on  G,  that  is  for  the  set  G  equipped  with  the  binary 
operation  *. 

Example  A.2. 2  It  is  evident  that  the  usual  sum  and  the  usual  product  in  N  are  binary 
operations. 

As  a  further  example  we  describe  a  binary  operation  which  does  not  come  from 
usual  arithmetic  operations  in  any  set  of  numbers.  Let  T  be  an  equilateral  triangle 
whose  vertices  are  ordered  and  denoted  by  ABC.  Let  R  be  the  set  of  rotations  on  a 
plane  under  which  each  vertex  is  taken  onto  another  vertex.  The  rotation  that  takes 
the  vertices  ABC  to  BCD,  can  be  denoted  by 
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It  is  clear  that  R  contains  three  elements,  which  are: 

A  B  C\  _  (A  B  C\ 

ABC )  X  ~\B  C  A) 

The  operation — denoted  now  o — that  we  consider  among  elements  in  R  is  the  com¬ 
position  of  rotations.  The  rotation  v  o  y  is  the  one  obtained  by  acting  on  the  vertices 
of  the  triangle  first  with  y  and  then  with  x.  It  is  easy  to  see  that  x  o  y  =  e.  The 
Table  A.  1  shows  the  composition  law  among  elements  in  R. 


( A  B  C\ 
\C  A  BJ' 


o 

e 

X 

y 

e 

e 

X 

y 

X 

X 

y 

e 

y 

y 

e 

X 

(A.l) 


Remark  A.2. 3  The  algebraic  structures  (N,  +)  and  (N,  •)  have  the  following  well 
known  properties,  for  all  elements  a,b,  c  e  N, 

a  T-  (b  H-  c)  —  (a  H-  b)  c,  a  T-  b  —  b  T-  a , 

a  •  (b  •  c)  =  (a  •  b)  •  c,  a  •  b  =  b  •  a  . 

The  set  N  has  elements,  denoted  0  and  1,  whose  properties  are  singled  out, 

0  +  a  =  a,  1  a  =  a 

for  any  a  e  N.  We  give  the  following  definition. 

Definition  A.2.4  Let  (G,  *)  be  an  algebraic  structure. 

(a)  (G,  *)  is  called  associative  if 

a  *  (b  *  c)  =  (a  *  b)  *  c 


for  any  a,  b,  c  e  G. 

(b)  (G,  *)  is  called  commutative  (or  abelian)  if 

a  *  b  =  b  *  a 


for  any  a,  b  e  G. 

(c)  An  element  e  e  G  is  called  an  identity  (or  a  neutral  element)  for  (G,  *)  (and  the 
algebraic  structure  is  often  denoted  by  (G,  *,  e))  if 
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e  *  a  —  a  *  e 


for  any  a  e  G. 

(d)  Let  (G,  *,  e)  be  an  algebraic  structure  with  an  identity  e.  An  element  b  e  G  such 
that 


a  *  b  =  b  *  a  =  e 


is  called  the  inverse  of  a,  and  denoted  by  a  1 .  The  elements  for  which  an  inverse 
exists  are  called  invertible. 

Remark A.2. 5  If  the  algebraic  structure  is  given  by  a  ‘sum  rule’,  like  in 
(N,  +),  the  neutral  element  is  usually  called  a  zero  element,  denoted  by  0,  with 
a  +  0  =  0  +  «  =  a.  Also,  the  inverse  of  an  element  a  is  usually  denoted  by  —a  and 
named  the  opposite  of  a. 

Example  A. 2. 6  It  is  easy  to  see  that  for  the  sets  considered  above  one  has  (N,  +,  0), 
(N,  •,  l),(7?,o,  e) .  Every  element  in  R  is  invertible  (since  one  has  v  o  y  =  y  o  x  =  e); 
the  set  (N,  • ,  1)  contains  only  one  invertible  element,  which  is  the  identity  itself,  while 
in  (N,  +,  0)  no  element  is  invertible. 

From  the  defining  relation  (c)  above  one  clearly  has  that  if  a-1  is  the  inverse 
of  a  e  (G,  *),  then  a  is  the  inverse  of  a~l.  This  suggests  a  way  to  enlarge  sets 
containing  elements  which  are  not  invertible,  so  to  have  a  new  algebraic  structure 
whose  elements  are  all  invertible.  For  instance,  one  could  define  the  set  of  integer 
numbers  Z  =  {±n  :  n  e  N}  and  sees  that  every  element  in  (Z,  +,  0)  is  invertible. 

Definition  A.2.7  An  algebraic  structure  (G,  *)  is  called  a  group  when  the  following 
properties  are  satisfied 

(a)  the  operation  *  is  associative, 

(b)  G  contains  an  identity  element  e  with  respect  to  *, 

(c)  every  element  in  G  is  invertible  with  respect  to  e. 

A  group  (G,  *,  e)  is  called  commutative  (or  abelian)  if  the  operation  *  is  commutative. 
Remark  A. 2. 8  Both  (Z,  +,  0)  and  (R,  o,  e)  are  abelian  groups. 

Proposition  A.2.9  Let  (G,  *,  e)  be  a  group.  Then 

(i)  the  identity  element  is  unique, 

(ii)  the  inverse  a~l  of  any  element  a  e  G  is  unique. 

Proof  (i)  Let  us  suppose  that  e,  e'  are  both  identities  for  (G,  *).  Then  it  should  be 
e  *  e'  =  e  since  e'  is  an  identity,  and  also  e  *  e'  =  e'  since  e  is  an  identity;  this 
would  then  mean  e  =  e' . 

(ii)  Let  b,  c  be  both  inverse  elements  to  a  e  G;  this  would  give  a  *  b  =  b  *  a  =  eand 
a  *  c  =  c  *  a  =  e.  Since  the  binary  operation  is  associative,  one  has 
b  *  (a  *  c)  =  (b  *  a)  *  c,  resulting  in  b  *  e  =  e  *  c  and  then  b  =  c.  □ 
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A.3  Rings  and  Fields 

Next  we  introduce  and  study  the  properties  of  a  set  equipped  with  two  binary 
operations — compatible  in  a  suitable  sense — which  resemble  the  sum  and  the  product 
of  integer  numbers  in  Z. 

Definition  A.3.1  Let  A  =  (A,  +,  Oa,  •,  1a)  be  a  set  with  two  operations,  called  sum 
(+)  and  product  (•),  with  two  distinguished  elements  called  Oa  and  1a  -  The  set  A  is 
called  a  ring  if  the  following  conditions  are  satisfied: 

(a)  (A,  +,  Oa)  is  an  abelian  group, 

(b)  the  product  •  is  associative, 

(c)  1a  is  the  identity  element  with  respect  to  the  product, 

(d)  one  has  a  •  (b  +  c)  =  {a  •  b)  +  (a  •  c)  for  any  a,b,  c  £  A. 

If  moreover  the  product  is  abelian,  A  is  called  an  abelian  ring. 

Example  A.3. 2  The  set  (Z,  +,  0,  •,  1)  is  clearly  an  abelian  ring. 

Definition  A.3.3  By  Z[X\  one  denotes  the  set  of  polynomials  in  the  indeterminate 
(or  variable)  X  with  coefficients  in  Z,  that  is  the  set  of  formal  expressions, 


Z[X]  = 


■  ^  ' a[ X i  —  anX  4-  an—\X  a\X  -f  a^  :  n  £  Lf,  <2/  £  Z  -  . 

„  i  =0 


If  Z[X]  3  p(X)  =  anXn  +  an-\Xn~l  +  . . .  +  a\X  +  a$  then  a^,  a\, . . . ,  an  are  the 
coefficients  of  the  polynomial  p(X ),  while  the  term  atXl  is  a  monomial  of  degree  i. 
The  degree  of  the  polynomial  p(X)  is  the  highest  degree  among  those  of  its  non  zero 
monomials.  If  p(X)  is  the  polynomial  above,  its  degree  is  n  provided  an  ^  0,  and 
one  denotes  deg  p(X)  =  n.  The  two  usual  operations  of  sum  and  product  in  Z[X] 
are  defined  as  follows.  Let  p(X),  q(X)  be  two  arbitrary  polynomials  in  Z[X\, 


P(X)  = 

i= 0 


q(X)  =  YJbiXi. 

i=  0 


Let  us  suppose  n  <  m.  One  sets 


p(X)+q(X)  =  J2cjxi’ 

j=  0 
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with  cj  =  a j  +  bj  for  o  <  j  <  n  and  Cj  =  bj  for  n  <  j  <  m.  One  would  have  an 
analogous  results  were  n  >  m.  For  the  product  one  sets 

m  ~\~n 

p(X)-q(X)  =  J2d''Xh’ 

h= 0 


where  dh  =  J2i+j=h  atbj. 

Proposition  A.3.4  Endowed  with  the  sum  and  the  product  as  defined  above,  the 
set  Z[X]  is  an  abelian  ring,  the  ring  of  polynomials  in  one  variable  with  integer 
coefficients. 

Proof  One  simply  transfer  to  Z[X]  the  analogous  structures  and  properties  of  the 
ring  (Z,  +,  0,  •,  1).  Let  0z[x]  be  the  null  polynomial,  that  is  the  polynomial  whose 
coefficients  are  all  equal  to  0z,  and  let  lz[x]  =  lz  be  the  polynomial  of  degree  0 
whose  only  non  zero  coefficient  is  equal  to  lz-  We  limit  ourselves  to  prove  that 
(Z[X],  +,  0z[x])  is  an  abelian  group. 

•  Clearly,  the  null  polynomial  0z[x]  is  the  identity  element  with  respect  to  the  sum 
of  polynomials. 

•  Let  us  consider  three  arbitrary  polynomials  in  Z[X\ , 

n  m  p 

p(X)  =  aiX\  q(X)  =  J2  b,X\  r(X)  =  £  c,Xl. 

i  =0  i  =0  i  =0 


We  show  that 


(p(X)  +  q(X))  +  r(X)  =  p(X)  +  (q(X)  +  r(X)). 

For  simplicity  we  consider  the  case  n  =  m  =  p ,  since  the  proof  for  the  general 
case  is  analogue.  From  the  definition  of  sum  of  polynomials,  one  has 

A(X)  =  (p(X)+q(X))  +  r(X) 

n  n  n 

=  y  ]  (Pi  +  bfiX1  +  y  ]  C{Xl  =  y  ^  \(ai  +  /?*•)  +  cf\Xl 

i  =0  i  =0  i  =0 


B(X)  =  p(X)  +  (q(X)  +  r(X)) 

n  n  n 

—  y  ]  &iXi  +  (bi  +  cfix1  =  y  ]  \ai  +  (bi  +  Ci)\xi . 

i  =0  i  =0  i  =0 


and 
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The  coefficients  of  A(X)  and  B(X )  are  given,  for  any  i  =  0,  by 

Y{ai  +  hi)  +  Ci\  and  \ai  +  {b[  +  c;)] 

and  they  coincide  being  the  sum  in  Z  associative.  This  means  that  A(X)  =  B(X). 

•  We  show  next  that  any  polynomial  p(X)  =  Y^i=o  a^1  *s  invertible  with  respect 
to  the  sum  in  Z[X].  Let  us  define  the  polynomial  p'(X )  =  ^-7=0(— a;)X*,  with 
(—at)  denoting  the  inverse  of  at  e  Z  with  respect  to  the  sum.  From  the  definition 
of  the  sum  of  polynomials,  one  clearly  has 

n  n  n 

p(x )  +  p\x )  =  Z  +  Z  =  z  («/  - 

i  =0  i  =0  i  =0 

Since  at  —  at  =  0%  for  any  i,  one  has  p(X)  +  p'(X)  =  0z[x]\  thns  p'(X)  is  the 
inverse  of  p(X). 

•  Finally,  we  show  that  the  sum  in  Z[X]  is  abelian.  Let  p(X)  and  q(X)  be  two  arbi¬ 
trary  polynomials  in  Z[X]  of  the  same  degree  deg  p(X)  =  n  =  d egq(X)  (again 
for  simplicity);  we  wish  to  show  that 

p(X)  +  q(X)=q(X)  +  p(X). 

From  the  definition  of  sum  of  polynomials, 


U(X)  =  p(X)  +q{X)  =  Z  (fl/  +  bi)T 

i= 0 
n 

V(X)  =  q(X)  +  p(X)  =  Z  (bi  +  a,)X!  : 

1=0 

the  coefficients  of  U (X)  and  V (X)  are  given,  for  any  i  =  0,  . . . ,  n  by 

cii  +  bi  and  bi  +  a/ 

which  coincide  since  the  sum  is  abelian  in  Z.  This  means  U (X)  =  V  (X). 

We  leave  as  an  exercise  to  finish  showing  that  Z[X]  with  the  sum  and  the  product 
above  fulfill  the  conditions  (b)-(d)  in  the  Definition  A.3.1  of  a  ring.  □ 

Remark  A. 3. 5  Direct  computation  show  the  following  well  known  properties  of  the 
abelian  ring  Z[X]  of  polynomials.  With  /(X),  g(X)  e  Z[X]  it  holds  that: 

(i)  deg(/(X)  +  g(X))  <  max{deg(/(X)),  deg(5(X))}; 

(ii)  deg(/(X)  •  g(X))  =  deg (f(X))  +  deg (g(X)). 

It  is  easy  to  see  that  the  set  (Q,  +,  •,  0,  1)  of  rational  numbers  is  an  abelian  ring  as 
well.  The  set  Q  has  indeed  a  richer  algebraic  structure  than  Z:  any  non  zero  element 
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0  7^  a  e  Q  is  invertible  with  respect  to  the  product.  If  a  =  p/q  with  p  ^  0,  then 
a~l  =  q/p  e  Q. 

Definition  A.3.6  An  abelian  ring  K  =  (K,  +,  0,  •,  1)  such  that  each  element 
0  7^  a  G  K  is  invertible  with  respect  to  the  product  •,  is  called  afield.  Equivalently 
one  sees  that  ( K ,  +,  0,  •,  1)  is  a  field  if  and  only  if  both  ( K ,  +,  0)  and  ( K ,  •,  1)  are 
abelian  groups  and  the  product  is  distributive  with  respect  to  the  sum,  that  is  the 
condition  (d)  of  the  Definition  A. 3.1  is  satisfied. 

Example  A3. 7  Clearly  (Q,  +,  0,  •,  1)  is  a  field,  while  (Z,  +,  0,  •,  1)  is  not.  The  fun¬ 
damental  example  of  a  field  for  us  will  be  the  set  R  =  (R,  +,  0,  •,  1)  of  real  numbers 
equipped  with  the  usual  definitions  of  sum  and  product. 

Analogously  to  the  Definition  A. 3. 3  one  can  define  the  sets  Q[X]  and  R[X]  of 
polynomials  with  rational  and  real  coefficients.  For  them  one  naturally  extends  the 
definitions  of  sum  and  products,  as  well  as  that  of  degree. 

Proposition  A.3.8  The  set  Q[A]  and  R[X]  are  both  abelian  rings.  □ 

It  is  worth  stressing  that  in  spite  of  the  fact  that  Q  and  R  are  fields,  neither  Q[X] 
nor  R[X]  are  such  since  a  polynomial  need  not  admit  an  inverse  with  respect  to  the 
product. 


A.4  Maps  Preserving  Algebraic  Structures 

The  Definition  A. 2.1  introduces  the  notion  of  algebraic  structure  (G,  *)  and  we  have 
described  what  groups,  rings  and  fields  are.  We  now  briefly  deal  with  maps  between 
algebraic  structures  of  the  same  kind,  which  preserve  the  binary  operations  defined 
in  them.  We  have  the  following  definition 

Definition  A.4.1 

A  map  /  :  G  — >  G'  between  two  groups  (G,  *g,  £g)  and  (Gr,  )  is  a  group 

homomorphism  if 

fix  *G  v)  =  f(x)  *Gr  f(y )  for  all  x,  y  e  G. 

Amap  /  :  A  — >  B  between  two  rings  (A,  +a,  0a,  -a,  1a)  and  ( B ,  +5,  0#,  •#,  1  b) 
is  a  ring  homomorphism  if 

f(x  +A  y)  =  f{x)  +B  f(y),  f(x-Ay)  =  f  (x)  -B  f  (y)  for  all  x,yeA. 

Example  A.4. 2  The  natural  inclusions  Z  C  Q,  Q  C  R  are  rings  homomorphisms,  as 
well  as  the  inclusion  Z  C  Z[x]  and  similar  ones. 

Exercise  A.4.3  The  map  Z  — >  Z  defined  by  n  h-+  2n  is  a  group  homomorphism 
with  respect  to  the  group  structure  (Z,  +,  0),  but  not  a  ring  homomorphism  with 
respect  to  the  ring  structure  (Z,  +,  0,  •,  1). 
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To  lighten  notations,  from  now  on  we  shall  denote  a  sum  by  +  and  a  product  by  • 
(and  more  generally  a  binary  operation  by  *),  irrespectively  of  the  set  in  which  they 
are  defined.  It  will  be  clear  from  the  context  which  one  they  refers  to. 

Group  homomorphisms  present  some  interesting  properties,  as  we  now  show. 

Proposition  A.4.4  Let  (G,  *,  eo)  and  (G' ,  *,  eGf  be  two  groups ,  and  f  :  G  — >  G' 
a  group  homomorphism.  Then, 

(i)  f(eG)=eG', 

(ii)  f(a~l)  =  ( f  (a))~l ,  for  any  a  g  G. 

Proof  (i)  Since  eG  is  the  identity  element  with  respect  to  the  sum,  we  can  write 

f(eG)  =  f(eG  *  eG)  =  f(eG)  *  f(eG), 

where  the  second  equality  is  valid  as  /  is  a  group  homomorphism.  Being 
f(eG)  G  G\  it  has  a  unique  inverse  (see  the  Proposition  A.  2.9),  (f(eG))~l  G  G\ 
that  we  can  multiply  with  both  sides  of  the  previous  equality,  thus  yielding 

f(eG)  *  ( f(eG)rl  =  f{eG)  *  f(eG)  *  ( f{eG))~x . 

This  relation  results  in 


eG’  =  / (eG)  *  eG'  =>►  eG>  =  / (eG). 


(ii)  Making  again  use  of  the  Proposition  A. 2. 9,  in  order  to  show  that  ( f(a ))  1  is  the 
inverse  (with  respect  to  the  product  in  G')  of  f(a)  it  suffices  to  show  that 

f(a)  *  (/(a))-1  =  eG’. 

From  the  definition  of  group  homomorphism,  it  is 

/(a)  *  =  f  (a  *  a-1)  =  f(eG)  =  eG- 

where  the  last  equality  follows  from  (i). 

If  /  :  A  — >  B  is  a  ring  homomorphism,  the  previous  properties  are  valid  with 
respect  to  both  the  sum  and  to  the  product,  that  is 

(T)  /(0A)  =  05and/(lA)  =  l5; 

(ii’)  f(—a)  =  —f(a)  for  any  a  e  A,  while  f(a~l)  =  ( f(a))~l  for  any  invertible 
(with  respect  to  the  product)  element  a  G  A  with  inverse  a~ 1 .  □ 

If  A,  B  are  fields,  a  ring  homomorphism  /  :  A  — >  B  is  called  afield  homomor¬ 
phism.  A  bijective  homomorphism  between  algebraic  structures  is  called  an  isomor¬ 
phism. 
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A.5  Complex  Numbers 


It  is  soon  realised  that  one  needs  enlarging  the  field  R  of  real  numbers  to  consider  zeros 
of  polynomials  with  real  coefficients.  The  real  coefficient  polynomial  p(x)  =  x2  +  1 
has  ‘complex’  zeros  usually  denoted  ±i,  and  their  presence  leads  to  defining  the  field 
of  complex  numbers  C.  One  considers  the  smallest  field  containing  R,  d=i  and  all 
possible  sums  and  products  of  them. 

Definition  A.5.1  The  set  of  complex  numbers  is  given  by  formal  expressions 

C  =  {z  =  a  +  ib  |  a,  b  e  R}. 


The  real  number  a  is  called  the  real  part  of  z,  denoted  a  =  (z);  the  real  number  b 

is  called  the  imaginary  part  of  z,  denoted  b  =  S(z). 

The  following  proposition  comes  as  an  easy  exercise. 

Proposition  A. 5.2  The  binary  operations  of  sum  and  product  defined  in  C  by 

(a  T"  ib)  +  (c  +  id)  —  ( a  T-  c)  T-  i (Jb  -f  d), 

(i a  +  ib)  •  (c  +  id)  =  (ac  —  bd)  +  i (be  +  ad) 


make  (C,  +,  0<c,  *,  lc)  a  field,  with  Oc  =  Or  +  iO^  =  Or  and 

lc  =  1r  +  iOr  =  1r.  □ 

Exercise  A.5.3  An  interesting  part  of  the  proof  of  the  proposition  above  is  to  deter¬ 
mine  the  inverse  z_1  of  the  complex  number  z  =  a  +  ib.  One  easily  checks  that 


(< a  +  ib)  1 


a 

a2  +  b2 


b 


—  l 


a2  +  b2 


a2  +  b2  (a 


Again  an  easy  exercise  establishes  the  following  proposition. 

Proposition  A.5.4  Given  z  =  a  +  ib  G  C  one  defines  its  conjugate  number  to  be 
Z  =  a  —  ib.  Then,  for  any  complex  number  z  =  a  +  ib  the  following  properties  hold: 

(i)  z  =  z, 

(ii)  z  =  z  if  and  only  if  z  E  R, 

(Hi)  zz  =  a2jrb2, 

iv)  z  +  z  =  (z).  □ 

Exercise  A.5.5  The  natural  inclusions  McC  given  by  R  9  a  i->  a  +  iO^  is  a  field 
homomorphism,  while  the  corresponding  inclusion  R[v]  C  C[x]  is  a  ring  homomor¬ 
phism. 

Remark  A.5. 6  We  mentioned  above  that  the  polynomial  v 2  +  1  =  p(x)  e  R[v]  can¬ 
not  be  decomposed  (i.e.  cannot  be  factorised)  as  a  product  of  degree  1  poly¬ 
nomials  in  M[v],  that  is,  with  real  coefficients.  On  the  other  hand,  the  identity 
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x2  +  1  =  (x  —  i)(v  +  i)  g  C[x]  shows  that  the  same  polynomial  can  be  decomposed 
into  degree  1  terms  if  the  coefficients  of  the  latter  are  taken  in  C.  This  is  not  sur¬ 
prising,  since  the  main  reason  to  enlarge  the  field  R  to  C  was  exactly  to  have  a  field 
containing  the  zero  of  the  polynomial  p(x). 

What  is  indeed  surprising  is  that  the  field  C  contains  the  zeros  of  any  polynomial 
with  real  coefficients.  This  is  the  result  that  we  recall  as  the  next  theorem. 

Proposition  A. 5.7  (Fundamental  theorem  of  algebra)  Let  f(x)  e  R[x]  be  a  polyno¬ 
mial  with  real  coefficients  and  deg  f(x)  >  1.  Then,  f{x)  has  at  least  a  zero  (that  is 
a  root)  in  C.  More  precisely,  if  deg  f(x)  =  n,  then  f(x)  has  n  ( possibly  non  distinct) 
roots  in  C.  If  Zu  ,  zs  are  these  distinct  roots,  the  polynomial  f(x)  can  be  written 
as 

f(x)  =  a(x  -  Zi)m(1)(x  -  Z2)m(2)  •  •  •  (X  -  Zs)m(s\ 
with  the  root  multiplicities  m(j)  for  j  =  1 , . . .  s,  such  that 

S 

X>o-)  =  »• 

;=i 

That  is  the  polynomial  f(x)  it  is  completely  factorisable  on  C.  □ 

A  more  general  result  states  that  C  is  an  algebraically  closed  field,  that  is  one  has 
the  following: 

Theorem  A.5.8  Let  f(x)  e  C[x]  be  a  degree  n  polynomial  with  complex  coeffi¬ 
cients.  Then  there  exist  n  complex  (non  distinct  in  generall )  roots  of  f(x).  Thus  the 
polynomial  f(x  )  is  completely  factorisable  on  C.  □ 


A.6  Integers  Modulo  A  Prime  Number 

We  have  seen  that  the  integer  numbers  Z  form  only  a  ring  and  not  a  field.  Out  of 
it  one  can  construct  fields  of  numbers  by  going  to  the  quotient  with  respect  to  an 
equivalence  relation  of  ‘modulo  an  integer’ .  As  an  example,  consider  the  set  Z3  of 
integer  modulo  3.  It  has  three  elements 

Z3  =  {[0],[1],[2]} 

which  one  also  simply  write  Z3  =  {0,  1,  2},  although  one  should  not  confuse  them 
with  the  corresponding  classes. 

One  way  to  think  of  the  three  elements  of  Z3  is  that  each  one  represents  the 
equivalence  class  of  all  integers  which  have  the  same  remainder  when  divided  by  3. 
For  instance,  [2]  denotes  the  set  of  all  integers  which  have  remainder  2  when  divided 
by  3  or  equivalently,  [2]  denotes  the  set  of  all  integers  which  are  congruent  to  2 
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modulo  3,  thus  [2]  =  {2,  5,  8,  1 1,  . . . }.  The  usual  arithmetic  operations  determine 
the  addition  and  multiplication  tables  for  this  set  as  show  in  Table  A. 2. 


+ 

0 

1 

2 

* 

0 

1 

2 

0 

0 

1 

2 

and 

0 

0 

0 

0 

1 

1 

2 

0 

1 

0 

1 

2 

2 

2 

0 

1 

2 

0 

2 

1 

(A. 2) 


Thus  —[1]  =  [2]  and  —[2]  =  [1]  and  Z3  is  an  abelian  group  for  the  addition.  Further¬ 
more,  [1]  *  [1]  =  [1]  and  [2]  *  [2]  =  [1]  and  both  nonzero  elements  have  inverse: 
[l]-1  =  [1]  and  [2]_1  =  [2].  All  of  this  makes  Z3  a  field. 

The  previous  construction  works  when  3  is  substituted  with  any  prime  number  p. 
We  recall  that  a  positive  integer  p  is  called  prime  if  it  is  only  divisible  by  itself  and 
by  1.  Thus,  for  any  prime  number  one  gets  the  field  of  integers  modulo  p: 

7LP  =  Z/pZ  =  {[0],  [1],  1]}. 

Each  of  its  elements  represents  the  equivalence  class  of  all  integers  which  have 
the  given  remainder  when  divided  by  p.  Equivalently,  each  element  denotes  the 
equivalence  class  of  all  integers  which  are  congruent  modulo  p.  The  corresponding 
addition  and  multiplication  tables,  defines  as  in  Z  but  now  taken  modulo  p ,  can  be 
easily  worked  out.  Notice  that  the  construction  does  not  work,  that  is  Zp  is  not  a  ring, 
if  p  is  not  a  prime  number:  were  this  the  case  there  would  be  divisors  of  zero. 
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A 

Affine  line,  241 
Affine  plane,  288,  299,  306 
Affine  space,  183,  235,  236,  238,  244,  245, 
247,  252,  269,  271,272,  275 
Algebraic  multiplicity  of  an  eigenvalue,  148, 
149,  170 

Angle  between  vectors,  35 
Angular  momentum,  14,  194,  309 
Angular  velocity,  14,  191-194 
Applied  vector,  1-3 
Axial  vector,  189-193 


B 

Basis  in  a  vector  space,  change  of,  118 
Basis  of  a  vector  space,  65 


Coordinate  system,  1, 5, 6,  8, 11, 13, 14, 191, 
237,318 

Coriolis  acceleration,  193,  194 

D 

Degenerate  conic,  301,  302,  305,  308,  318, 
320 

Diagonalisation  of  a  matrix,  145 
Diagonalisation  of  an  endomorphism,  143 
Diagonal  matrix,  56, 133, 144, 147,215,216, 
221,  222 

Dimension  of  a  vector  space,  55 
Dirac’s  bra-ket  notations,  129 
Directrix  of  a  conic,  293,  297,  318,  321,  322 
Direct  sum,  24,  34,  143,  158,  162 
Distance  between  linear  affine  varieties,  275 
Divergence,  15 

Dual  basis,  126,  128,  129,  233 
Dual  space,  125,  126,  197,  233 


C 

Characteristic  polynomial  of  a  matrix,  138 
Characteristic  polynomial  of  an  endomor¬ 
phism,  138 
Cofactor,  77,  78 

Commutator,  176,  187,  188,  206,  228,  229 
Commuting  endomorphisms,  137 
Complex  numbers,  129,  201,  329,  339 
Component  of  a  vector,  224,  225 
Composition  of  linear  maps,  116 
Composition  of  maps,  104,  117,  130 
Conic  sections,  293,  309,  310 


E 

Eccentricity  of  a  conic,  327 

Eigenvalues,  134-139,  142-145,  147,  149, 
156,  158,  168,  169,  171,  194,  195, 
203,  205,  211,  215,  216,  223,  314, 
319,  320 

Eigenvector,  134,  135,  137,  149,  163,  194, 
195, 207 

Ellipse,  294,  295,  297-299,  302,  305,  318, 
321,  322,  324 

Endomorphism,  131-139,  142-145,  155— 
159,  163,  166,  169,  170,  173,  174, 
188,  198,  200,  202,  203,  205,  206, 
225,  226 
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Equivalence  relation,  156,  218,  223,  330, 
331,340 

Euclidean  affine  space,  271,  275 
Euclidean  structure,  173,  299 
Euclidean  vector  space,  37-41,  153,  159, 
174 

Euler  angles,  186 
Exponential  of  a  matrix,  208,  210 

F 

Field,  15-17,  20,  190,  191,  204,  329,  337, 
339-341 

Field  strength  matrix,  electro-magnetic,  23 1 , 
232 


G 

Gauss’ algorithm,  61,  85 
Geometric  multiplicity  of  an  eigenvalue, 
145, 148 
Gradient,  15 

Gram-Schmidt  orthogonalization,  43 
Grassmann  theorem,  34,  144 
Group,  2,  4,  17-19,  51,  52,  153,  154,  178, 
180-182,  184,  207,  211,  225-227, 
333,335,337,  338 


H 

Hermitian  endomorphism,  197,  204-206 
Hermitian  structure,  205 
Homomorphism,  211,  337-339 
Hyperbola,  296-298,  308,  317,  318,  320, 
322,  324 


I 

Image,  61,  105,  108,  109,  115,  142,  331 
Inertia  matrix  of  a  rigid  body,  195 
Injectivity  of  a  linear  map,  104,  109,  114 
Intrinsic  rotation,  187 

Invertible  matrix,  69, 117, 120, 132, 145, 178 
Isometry,  linear,  183 

Isomorphism,  35,  107-109,  111,  112,  115, 
117,  118,  130,  155,238 


J 

Jordan  normal  form  of  a  matrix,  147 


K 

Keplerian  motions,  309 


Kernel,  105,  106,  109,  110,  137,  138,  177, 
182 

Kinetic  energy,  1 1 

L 

Laplacian,  16,  230 
Levi-Civita  symbol,  187-190,  194 
Lie  algebra  of  antisymmetric  matrices,  175, 
176,  180,  206,  229 

Lie  algebra  of  skew-adjoint  matrices,  206 
Lie  algebra,  matrix,  176,  187,  188,  206 
Linear  affine  variety 

cartesian  equation,  249,  252,  253,  262 
parametric  equation,  242,  243,  249,  251, 
253,254,  256,  291,323 
skew,  247 

vector  equation,  245,  257,  260,  261,  323 
Linear  combinations,  24,  134 
Linear  independence,  26,  69,  106 
Linear  transformation,  97 
image,  105,  109,  111 
kernel,  104,  109,  110,  122 
Line  of  nodes,  187 
Lorentz  boost 
event,  226 

Lorentz  force,  190,  191 
Lorentz  group,  225-227 
special,  226 

M 

Matrix,  151-153,  155,  157,  160,  161,  164, 
166-168,  172,  173,  175,  176 
Matrix  determinant,  69, 72, 73, 76, 215, 226, 
303 

Laplace  expansion,  73,  74,  77 
Matrix  trace,  66,  67,  150,  160 
Matrix  transposition,  78,  199,  312 
Maxwell  equations,  229-232 
Minkowski  spacetime,  230,  231 
Minor,  45,  72,  188,  189 
Mixed  product,  9,  14,  16 
Momentum  of  a  vector,  13 

N 

Normal  endomorphism,  203-206 
Norm  of  a  vector,  10,  37 
Nutation,  187 

O 

One  parameter  group  of  unitary  matrices, 
211 
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Orthogonal  basis,  181 
Orthogonal  group,  153,  176,  178,  180,  181 
special,  153,  181,  184 

Orthogonality  between  affine  linear  variety, 
271,272,  276 
Orthogonal  map,  156 

Orthogonal  matrix,  153,  167,  171,  177,  179, 
182,  184,  191-193,315 
Orthogonal  projection,  11,  42,  158-162, 
276,  294 


P 

Parabola,  293, 294, 297-301, 307, 311, 317- 
319,  324,  325 

Parallelism  between  affine  linear  variety,  245 
Parallelogramm 

sum  rule,  236,  333 
Polar  vector,  190-193 
Precession,  187 
Pseudo  vector,  189,  190 


Q 

Quadratic  form,  213-220,  222,  224,  225, 
233,300,311,313,314,319,  326 

R 

Rank  of  a  matrix,  55,  58 
Reduced  mass,  309 
Ring,  51,  329,  334-337,  339,  341 
Rotation,  6,  173,  183,  184,  186,  192,  194, 
227,  228,  301,  311,  312,  316,  326, 
332 

Rotation  angle,  184 
Rotation  axis,  183-185 
Rotor,  15 

Rouche-Capelli  theorem,  94 

Row  by  column  product,  50,  67,  130,  152 

S 

Scalar  field,  16,  229,  230 
Scalar  product,  9,  11,  12,  16,  35,  36,  41,  42, 
45,  49,  154,  166,  213,  218,  220,  269, 
299 


Self-adjoint  endomorphism,  156,  157,  159, 
163,  166,  169,  175,  205 
Signature  of  a  quadratic  form,  216,  218,  219 
Skew-adjoint  endomorphism,  174-176,  206 
Spatial  parity,  226,  227 
Spectral  theorems,  197,  203 
Spectrum  of  an  endomorphism,  134,  205 
Surjectivity  of  a  linear  map,  114 
Symmetric  matrix,  164,  165,  178,  213,  216, 
220,  221 

System  of  linear  equations,  47,  249 

homogeneous,  137,  146,  149,  164,  248, 
249 


T 

Time  reversal,  226,  227 
Triangular  matrix 
lower,  58 

upper,  56-59,  76,  83,  200 


U 

Unitary  endomorphism,  205 
Unitary  group,  207 
special,  207 

Unitary  matrix,  202,  208,  210,  223 


V 

Vector,  1-8,  11,  13,  15,  19,  22,  23,  26,  30, 
39,  44,  49,  183,  189,  190,  202,  225, 
226,  235,  239,  241,  243,  256,  270, 
271,  274,  285,  288,  291,  309,  323 
Vector  field,  15,  16,  190,  191,  229,  230 
Vector  line,  23,  166,  183-185,  239 
Vector  plane,  242,  245 
Vector  product,  9, 12-14, 189, 190, 192-194 
Vector  space,  4,  18-24,  26,  28,  30-33,  35, 
38,  39,  42,  48,  60,  65,  97,  100,  107, 
118,  125,  128,  131-133,  137,  142, 
143,  166,  173,  176,  182,  198,  206, 
213,  214,  218,  222,  229,  235,  238, 
255, 263,  269 
complex,  45,  128,  163,  222 
Vector  subspace,  21-24,  40,  53,  104,  105, 
134,  159,  162,  176,  240,  245 


